Age | Commit message (Collapse) | Author |
|
commit 23adbe12ef7d3d4195e80800ab36b37bee28cd03 upstream.
The kernel has no concept of capabilities with respect to inodes; inodes
exist independently of namespaces. For example, inode_capable(inode,
CAP_LINUX_IMMUTABLE) would be nonsense.
This patch changes inode_capable to check for uid and gid mappings and
renames it to capable_wrt_inode_uidgid, which should make it more
obvious what it does.
Fixes CVE-2014-4014.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c41570c9d29764f797fa35490d72b7395a0105c3 upstream.
When defragging a very large file, the cluster variable can wrap its 32-bit
signed int type and become negative, which eventually gets passed to
btrfs_force_ra() as a very large unsigned long value. On 32-bit platforms,
this eventually results in an Oops from the SLAB allocator.
Change the cluster and max_cluster signed int variables to unsigned long to
match the readahead functions. This also allows the min() comparison in
btrfs_defrag_file() to work as intended.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d71f290b4e98a39f49f2595a13be3b4d5ce8e1f1 upstream.
Specify the maximum stack size for arches where the stack grows upward
(parisc and metag) in asm/processor.h rather than hard coding in
fs/exec.c so that metag can specify a smaller value of 256MB rather than
1GB.
This fixes a BUG on metag if the RLIMIT_STACK hard limit is increased
beyond a safe value by root. E.g. when starting a process after running
"ulimit -H -s unlimited" it will then attempt to use a stack size of the
maximum 1GB which is far too big for metag's limited user virtual
address space (stack_top is usually 0x3ffff000):
BUG: failure at fs/exec.c:589/shift_arg_pages()!
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: linux-parisc@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a1b8ff4c97b4375d21b6d6c45d75877303f61b3b upstream.
The nfsv4 state code has always assumed a one-to-one correspondance
between lock stateid's and lockowners even if it appears not to in some
places.
We may actually change that, but for now when FREE_STATEID releases a
lock stateid it also needs to release the parent lockowner.
Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
calls same_lockowner_ino on a lockowner that unexpectedly has an empty
so_stateids list.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 27b11428b7de097c42f205beabb1764f4365443b upstream.
The current code assumes a one-to-one lockowner<->lock stateid
correspondance.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit aa07c713ecfc0522916f3cd57ac628ea6127c0ec upstream.
After setting ACL for directory, I got two problems that caused
by the cached zero-length default posix acl.
This patch make sure nfsd4_set_nfs4_acl calls ->set_acl
with a NULL ACL structure if there are no entries.
Thanks for Christoph Hellwig's advice.
First problem:
............ hang ...........
Second problem:
[ 1610.167668] ------------[ cut here ]------------
[ 1610.168320] kernel BUG at /root/nfs/linux/fs/nfsd/nfs4acl.c:239!
[ 1610.168320] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1610.168320] Modules linked in: nfsv4(OE) nfs(OE) nfsd(OE)
rpcsec_gss_krb5 fscache ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack
rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
auth_rpcgss nfs_acl snd_intel8x0 ppdev lockd snd_ac97_codec ac97_bus
snd_pcm snd_timer e1000 pcspkr parport_pc snd parport serio_raw joydev
i2c_piix4 sunrpc(OE) microcode soundcore i2c_core ata_generic pata_acpi
[last unloaded: nfsd]
[ 1610.168320] CPU: 0 PID: 27397 Comm: nfsd Tainted: G OE
3.15.0-rc1+ #15
[ 1610.168320] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 1610.168320] task: ffff88005ab653d0 ti: ffff88005a944000 task.ti:
ffff88005a944000
[ 1610.168320] RIP: 0010:[<ffffffffa034d5ed>] [<ffffffffa034d5ed>]
_posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd]
[ 1610.168320] RSP: 0018:ffff88005a945b00 EFLAGS: 00010293
[ 1610.168320] RAX: 0000000000000001 RBX: ffff88006700bac0 RCX:
0000000000000000
[ 1610.168320] RDX: 0000000000000000 RSI: ffff880067c83f00 RDI:
ffff880068233300
[ 1610.168320] RBP: ffff88005a945b48 R08: ffffffff81c64830 R09:
0000000000000000
[ 1610.168320] R10: ffff88004ea85be0 R11: 000000000000f475 R12:
ffff880068233300
[ 1610.168320] R13: 0000000000000003 R14: 0000000000000002 R15:
ffff880068233300
[ 1610.168320] FS: 0000000000000000(0000) GS:ffff880077800000(0000)
knlGS:0000000000000000
[ 1610.168320] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1610.168320] CR2: 00007f5bcbd3b0b9 CR3: 0000000001c0f000 CR4:
00000000000006f0
[ 1610.168320] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1610.168320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1610.168320] Stack:
[ 1610.168320] ffffffff00000000 0000000b67c83500 000000076700bac0
0000000000000000
[ 1610.168320] ffff88006700bac0 ffff880068233300 ffff88005a945c08
0000000000000002
[ 1610.168320] 0000000000000000 ffff88005a945b88 ffffffffa034e2d5
000000065a945b68
[ 1610.168320] Call Trace:
[ 1610.168320] [<ffffffffa034e2d5>] nfsd4_get_nfs4_acl+0x95/0x150 [nfsd]
[ 1610.168320] [<ffffffffa03400d6>] nfsd4_encode_fattr+0x646/0x1e70 [nfsd]
[ 1610.168320] [<ffffffff816a6e6e>] ? kmemleak_alloc+0x4e/0xb0
[ 1610.168320] [<ffffffffa0327962>] ?
nfsd_setuser_and_check_port+0x52/0x80 [nfsd]
[ 1610.168320] [<ffffffff812cd4bb>] ? selinux_cred_prepare+0x1b/0x30
[ 1610.168320] [<ffffffffa0341caa>] nfsd4_encode_getattr+0x5a/0x60 [nfsd]
[ 1610.168320] [<ffffffffa0341e07>] nfsd4_encode_operation+0x67/0x110
[nfsd]
[ 1610.168320] [<ffffffffa033844d>] nfsd4_proc_compound+0x21d/0x810 [nfsd]
[ 1610.168320] [<ffffffffa0324d9b>] nfsd_dispatch+0xbb/0x200 [nfsd]
[ 1610.168320] [<ffffffffa00850cd>] svc_process_common+0x46d/0x6d0 [sunrpc]
[ 1610.168320] [<ffffffffa0085433>] svc_process+0x103/0x170 [sunrpc]
[ 1610.168320] [<ffffffffa032472f>] nfsd+0xbf/0x130 [nfsd]
[ 1610.168320] [<ffffffffa0324670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1610.168320] [<ffffffff810a5202>] kthread+0xd2/0xf0
[ 1610.168320] [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320] [<ffffffff816c1ebc>] ret_from_fork+0x7c/0xb0
[ 1610.168320] [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320] Code: 78 02 e9 e7 fc ff ff 31 c0 31 d2 31 c9 66 89 45 ce
41 8b 04 24 66 89 55 d0 66 89 4d d2 48 8d 04 80 49 8d 5c 84 04 e9 37 fd
ff ff <0f> 0b 90 0f 1f 44 00 00 55 8b 56 08 c7 07 00 00 00 00 8b 46 0c
[ 1610.168320] RIP [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0
[nfsd]
[ 1610.168320] RSP <ffff88005a945b00>
[ 1610.257313] ---[ end trace 838254e3e352285b ]---
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 4cb57e3032d4e4bf5e97780e9907da7282b02b0c upstream.
Mainly to ensure that we don't leave any hanging timers.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 5694c93e6c4954fa9424c215f75eeb919bddad64 upstream.
Aside from making it clearer what is non-trivial in create_client(), it
also fixes a bug whereby we can call free_client() before idr_init()
has been called.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 6b6751f7feba68d8f5c72b72cc69a1c5a625529c upstream.
autofs needs to be able to see private data dentry flags for its dentrys
that are being created but not yet hashed and for its dentrys that have
been rmdir()ed but not yet freed. It needs to do this so it can block
processes in these states until a status has been returned to indicate
the given operation is complete.
It does this by keeping two lists, active and expring, of dentrys in
this state and uses ->d_release() to keep them stable while it checks
the reference count to determine if they should be used.
But with the recent lockref changes dentrys being freed sometimes don't
transition to a reference count of 0 before being freed so autofs can
occassionally use a dentry that is invalid which can lead to a panic.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 50c6e282bdf5e8dabf8d7cf7b162545a55645fd9 upstream.
Various filesystems don't bother checking for a NULL ACL in
posix_acl_equiv_mode, and thus can dereference a NULL pointer when it
gets passed one. This usually happens from the NFS server, as the ACL tools
never pass a NULL ACL, but instead of one representing the mode bits.
Instead of adding boilerplat to all filesystems put this check into one place,
which will allow us to remove the check from other filesystems as well later
on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>,
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 754320d6e166d3a12cb4810a452bde00afbd4e9a upstream.
iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns,
but it doesn't hold when failure happens right after aio_setup_vectored_rw().
Fix that in a such way to avoid hairy goto.
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 404ca80eb5c2727d78cd517d12108b040c522e12 upstream.
A va_list needs to be copied in case it needs to be used twice.
Thanks to Hugh for debugging this issue, leading to various panics.
Tested:
lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
'produce_core' is simply : main() { *(int *)0 = 1;}
lpq84:~# ./produce_core
Segmentation fault (core dumped)
lpq84:~# dmesg | tail -1
[ 614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed
Notice the last argument was replaced by a NULL (we were lucky enough to
not crash, but do not try this on your production machine !)
After fix :
lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
lpq83:~# ./produce_core
Segmentation fault
lpq83:~# dmesg | tail -1
[ 740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed
Fixes: 5fe9d8ca21cc ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Hugh Dickins <hughd@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a2a4dc494a7b7135f460e38e788c4a58f65e4ac3 upstream.
Commit 9e30cc9595303b27b48 removed an internal mount. This
has the side-effect that rootfs now has FSID 0. Many
userspace utilities assume that st_dev in struct stat
is never 0, so this change breaks a number of tools in
early userspace.
Since we don't know how many userspace programs are affected,
make sure that FSID is at least 1.
References: http://article.gmane.org/gmane.linux.kernel/1666905
References: http://permalink.gmane.org/gmane.linux.utilities.util-linux-ng/8557
Signed-off-by: Thomas Bächler <thomas@archlinux.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c11f1df5003d534fd067f0168bfad7befffb3b5c upstream.
Problem reported in Red Hat bz 1040329 for strict writes where we cache
only when we hold oplock and write direct to the server when we don't.
When we receive an oplock break, we first change the oplock value for
the inode in cifsInodeInfo->oplock to indicate that we no longer hold
the oplock before we enqueue a task to flush changes to the backing
device. Once we have completed flushing the changes, we return the
oplock to the server.
There are 2 ways here where we can have data corruption
1) While we flush changes to the backing device as part of the oplock
break, we can have processes write to the file. These writes check for
the oplock, find none and attempt to write directly to the server.
These direct writes made while we are flushing from cache could be
overwritten by data being flushed from the cache causing data
corruption.
2) While a thread runs in cifs_strict_writev, the machine could receive
and process an oplock break after the thread has checked the oplock and
found that it allows us to cache and before we have made changes to the
cache. In that case, we end up with a dirty page in cache when we
shouldn't have any. This will be flushed later and will overwrite all
subsequent writes to the part of the file represented by this page.
Before making any writes to the server, we need to confirm that we are
not in the process of flushing data to the server and if we are, we
should wait until the process is complete before we attempt the write.
We should also wait for existing writes to complete before we process
an oplock break request which changes oplock values.
We add a version specific downgrade_oplock() operation to allow for
differences in the oplock values set for the different smb versions.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit dd20908a8a06b22c171f6c3fcdbdbd65bed07505 upstream.
it's pointless and actually leads to wrong behaviour in at least one
moderately convoluted case (pipe(), close one end, try to get to
another via /proc/*/fd and run into ETXTBUSY).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
during lockd_up
commit 679b033df48422191c4cac52b610d9980e019f9b upstream.
We had a Fedora ABRT report with a stack trace like this:
kernel BUG at net/sunrpc/svc.c:550!
invalid opcode: 0000 [#1] SMP
[...]
CPU: 2 PID: 913 Comm: rpc.nfsd Not tainted 3.13.6-200.fc20.x86_64 #1
Hardware name: Hewlett-Packard HP ProBook 4740s/1846, BIOS 68IRR Ver. F.40 01/29/2013
task: ffff880146b00000 ti: ffff88003f9b8000 task.ti: ffff88003f9b8000
RIP: 0010:[<ffffffffa0305fa8>] [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
RSP: 0018:ffff88003f9b9de0 EFLAGS: 00010206
RAX: ffff88003f829628 RBX: ffff88003f829600 RCX: 00000000000041ee
RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
RBP: ffff88003f9b9de8 R08: 0000000000017360 R09: ffff88014fa97360
R10: ffffffff8114ce57 R11: ffffea00051c9c00 R12: ffff88003f829600
R13: 00000000ffffff9e R14: ffffffff81cc7cc0 R15: 0000000000000000
FS: 00007f4fde284840(0000) GS:ffff88014fa80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4fdf5192f8 CR3: 00000000a569a000 CR4: 00000000001407e0
Stack:
ffff88003f792300 ffff88003f9b9e18 ffffffffa02de02a 0000000000000000
ffffffff81cc7cc0 ffff88003f9cb000 0000000000000008 ffff88003f9b9e60
ffffffffa033bb35 ffffffff8131c86c ffff88003f9cb000 ffff8800a5715008
Call Trace:
[<ffffffffa02de02a>] lockd_up+0xaa/0x330 [lockd]
[<ffffffffa033bb35>] nfsd_svc+0x1b5/0x2f0 [nfsd]
[<ffffffff8131c86c>] ? simple_strtoull+0x2c/0x50
[<ffffffffa033c630>] ? write_pool_threads+0x280/0x280 [nfsd]
[<ffffffffa033c6bb>] write_threads+0x8b/0xf0 [nfsd]
[<ffffffff8114efa4>] ? __get_free_pages+0x14/0x50
[<ffffffff8114eff6>] ? get_zeroed_page+0x16/0x20
[<ffffffff811dec51>] ? simple_transaction_get+0xb1/0xd0
[<ffffffffa033c098>] nfsctl_transaction_write+0x48/0x80 [nfsd]
[<ffffffff811b8b34>] vfs_write+0xb4/0x1f0
[<ffffffff811c3f99>] ? putname+0x29/0x40
[<ffffffff811b9569>] SyS_write+0x49/0xa0
[<ffffffff810fc2a6>] ? __audit_syscall_exit+0x1f6/0x2a0
[<ffffffff816962e9>] system_call_fastpath+0x16/0x1b
Code: 31 c0 e8 82 db 37 e1 e9 2a ff ff ff 48 8b 07 8b 57 14 48 c7 c7 d5 c6 31 a0 48 8b 70 20 31 c0 e8 65 db 37 e1 e9 f4 fe ff ff 0f 0b <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55
RIP [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
RSP <ffff88003f9b9de0>
Evidently, we created some lockd sockets and then failed to create
others. make_socks then returned an error and we tried to tear down the
svc, but svc->sv_permsocks was not empty so we ended up tripping over
the BUG() in svc_destroy().
Fix this by ensuring that we tear down any live sockets we created when
socket creation is going to return an error.
Fixes: 786185b5f8abefa (SUNRPC: move per-net operations from...)
Reported-by: Raphos <raphoszap@laposte.net>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit fa8a53c39f3fdde98c9eace6a9b412143f0f6ed6 upstream.
As reported by Tang Chen, Gu Zheng and Yasuaki Isimatsu, the following issues
exist in the aio ring page migration support.
As a result, for example, we have the following problem:
thread 1 | thread 2
|
aio_migratepage() |
|-> take ctx->completion_lock |
|-> migrate_page_copy(new, old) |
| *NOW*, ctx->ring_pages[idx] == old |
|
| *NOW*, ctx->ring_pages[idx] == old
| aio_read_events_ring()
| |-> ring = kmap_atomic(ctx->ring_pages[0])
| |-> ring->head = head; *HERE, write to the old ring page*
| |-> kunmap_atomic(ring);
|
|-> ctx->ring_pages[idx] = new |
| *BUT NOW*, the content of |
| ring_pages[idx] is old. |
|-> release ctx->completion_lock |
As above, the new ring page will not be updated.
Fix this issue, as well as prevent races in aio_ring_setup() by holding
the ring_lock mutex during kioctx setup and page migration. This avoids
the overhead of taking another spinlock in aio_read_events_ring() as Tang's
and Gu's original fix did, pushing the overhead into the migration code.
Note that to handle the nesting of ring_lock inside of mmap_sem, the
migratepage operation uses mutex_trylock(). Page migration is not a 100%
critical operation in this case, so the ocassional failure can be
tolerated. This issue was reported by Sasha Levin.
Based on feedback from Linus, avoid the extra taking of ctx->completion_lock.
Instead, make page migration fully serialised by mapping->private_lock, and
have aio_free_ring() simply disconnect the kioctx from the mapping by calling
put_aio_ring_file() before touching ctx->ring_pages[]. This simplifies the
error handling logic in aio_migratepage(), and should improve robustness.
v4: always do mutex_unlock() in cases when kioctx setup fails.
Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 4991a628a789dc5954e98e79476d9808812292ec upstream.
A fl->fl_break_time of 0 has a special meaning to the lease break code
that basically means "never break the lease". knfsd uses this to ensure
that leases don't disappear out from under it.
Unfortunately, the code in __break_lease can end up passing this value
to wait_event_interruptible as a timeout, which prevents it from going
to sleep at all. This causes __break_lease to spin in a tight loop and
causes soft lockups.
Fix this by ensuring that we pass a minimum value of 1 as a timeout
instead.
Cc: J. Bruce Fields <bfields@fieldses.org>
Reported-by: Terry Barnaby <terry1@beam.ltd.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 6e6358fc3c3c862bfe9a5bc029d3f8ce43dc9765 upstream.
We haven't taken i_mutex yet, so we need to use i_size_read().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 622cad1325e404598fe3b148c3fa640dbaabc235 upstream.
The function ext4_update_i_disksize() is used in only one place, in
the function mpage_map_and_submit_extent(). Move its code to simplify
the code paths, and also move the call to ext4_mark_inode_dirty() into
the i_data_sem's critical region, to be consistent with all of the
other places where we update i_disksize. That way, we also keep the
raw_inode's i_disksize protected, to avoid the following race:
CPU #1 CPU #2
down_write(&i_data_sem)
Modify i_disk_size
up_write(&i_data_sem)
down_write(&i_data_sem)
Modify i_disk_size
Copy i_disk_size to on-disk inode
up_write(&i_data_sem)
Copy i_disk_size to on-disk inode
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit ec4cb1aa2b7bae18dd8164f2e9c7c51abcf61280 upstream.
When heavily exercising xattr code the assertion that
jbd2_journal_dirty_metadata() shouldn't return error was triggered:
WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
jbd2_journal_dirty_metadata+0x1ba/0x260()
CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1
Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
Call Trace:
[<ffffffff816311b0>] dump_stack+0x19/0x1b
[<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
[<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
[<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
[<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
[<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
[<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
[<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
[<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
[<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
[<ffffffff811935ce>] generic_setxattr+0x6e/0x90
[<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
[<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
[<ffffffff8119421e>] setxattr+0x13e/0x1e0
[<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
[<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8118c65c>] ? fget_light+0x3c/0x130
[<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
[<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
[<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
The reason for the warning is that buffer_head passed into
jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
caused by the following race of two ext4_xattr_release_block() calls:
CPU1 CPU2
ext4_xattr_release_block() ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
le32_add_cpu(&BHDR(bh)->h_refcount, -1);
unlock_buffer(bh);
lock_buffer(bh);
/* True */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
get_bh(bh);
ext4_free_blocks()
...
jbd2_journal_forget()
jbd2_journal_unfile_buffer()
-> JH is gone
error = ext4_handle_dirty_xattr_block(handle, inode, bh);
-> triggers the warning
We fix the problem by moving ext4_handle_dirty_xattr_block() under the
buffer lock. Sadly this cannot be done in nojournal mode as that
function can call sync_dirty_buffer() which would deadlock. Luckily in
nojournal mode the race is harmless (we only dirty already freed buffer)
and thus for nojournal mode we leave the dirtying outside of the buffer
lock.
Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 9503c67c93ed0b95ba62d12d1fd09da6245dbdd6 upstream.
ext4_end_bio() currently throws away the error that it receives. Chances
are this is part of a spate of errors, one of which will end up getting
the error returned to userspace somehow, but we shouldn't take that risk.
Also print out the errno to aid in debug.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 4adb6ab3e0fa71363a5ef229544b2d17de6600d7 upstream.
When we try to get 2^32-1 block of the file which has the extent
(ee_block=2^32-2, ee_len=1) with FIBMAP ioctl, it causes BUG_ON
in ext4_ext_put_gap_in_cache().
To avoid the problem, ext4_map_blocks() needs to check the file logical block
number. ext4_ext_put_gap_in_cache() called via ext4_map_blocks() cannot
handle 2^32-1 because the maximum file logical block number is 2^32-2.
Note that ext4_ind_map_blocks() returns -EIO when the block number is invalid.
So ext4_map_blocks() should also return the same errno.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit f81c20158f8d5f7938d5eb86ecc42ecc09273ce6 upstream.
Commit 9548906b2bb7 ('xattr: Constify ->name member of "struct xattr"')
missed that ocfs2 is calling kfree(xattr->name). As a result, kernel
panic occurs upon calling kfree(xattr->name) because xattr->name refers
static constant names. This patch removes kfree(xattr->name) from
ocfs2_mknod() and ocfs2_symlink().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Tested-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit f7cf4f5bfe073ad792ab49c04f247626b3e38db6 upstream.
Do not put bh when buffer_uptodate failed in ocfs2_write_block and
ocfs2_write_super_or_backup, because it will put bh in b_end_io.
Otherwise it will hit a warning "VFS: brelse: Trying to free free
buffer".
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit ded2cf71419b9353060e633b59e446c42a6a2a09 upstream.
There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node. After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
dlm->reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value. This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.
old recovery master: new recovery master:
dlm_remaster_locks()
become recovery master for
another dead node.
dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
return -EAGAIN;
}
dlm_set_reco_master(dlm, br->node_idx);
dlm_set_reco_dead_node(dlm, br->dead_node);
}
dlm_reset_recovery()
{
dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
will hang in dlm_remaster_locks() for
request dlm locks info
Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.
A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 34aa8dac482f1358d59110d5e3a12f4351f6acaa upstream.
This issue was introduced by commit 800deef3f6f8 ("ocfs2: use
list_for_each_entry where benefical") in 2007 where it replaced
list_for_each with list_for_each_entry. The variable "lock" will point
to invalid data if "tmpq" list is empty and a panic will be triggered
due to this. Sunil advised reverting it back, but the old version was
also not right. At the end of the outer for loop, that
list_for_each_entry will also set "lock" to an invalid data, then in the
next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
data and cause the panic. So reverting the list_for_each back and reset
"lock" to NULL to fix this issue.
Another concern is that this seemes can not happen because the "tmpq"
list should not be empty. Let me describe how.
old lock resource owner(node 1): migratation target(node 2):
image there's lockres with a EX lock from node 2 in
granted list, a NR lock from node x with convert_type
EX in converting list.
dlm_empty_lockres() {
dlm_pick_migration_target() {
pick node 2 as target as its lock is the first one
in granted list.
}
dlm_migrate_lockres() {
dlm_mark_lockres_migrating() {
res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
//after the above code, we can not dirty lockres any more,
// so dlm_thread shuffle list will not run
downconvert lock from EX to NR
upconvert lock from NR to EX
<<< migration may schedule out here, then
<<< node 2 send down convert request to convert type from EX to
<<< NR, then send up convert request to convert type from NR to
<<< EX, at this time, lockres granted list is empty, and two locks
<<< in the converting list, node x up convert lock followed by
<<< node 2 up convert lock.
// will set lockres RES_MIGRATING flag, the following
// lock/unlock can not run
dlm_lockres_release_ast(dlm, res);
}
dlm_send_one_lockres()
dlm_process_recovery_data()
for (i=0; i<mres->num_locks; i++)
if (ml->node == dlm->node_num)
for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
list_for_each_entry(lock, tmpq, list)
if (lock) break; <<< lock is invalid as grant list is empty.
}
if (lock->ml.node != ml->node)
BUG() >>> crash here
}
I see the above locks status from a vmcore of our internal bug.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 01d8885785a60ae8f4c37b0ed75bdc96d0fc6a44 upstream.
jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2)
The -ENOENT is due to readdir calling dir_emit on the same entry twice.
If the dir_emit callback sleeps and the tree is changed underneath us,
we won't be able to trust deh_offset(deh) anymore. We need to save
next_pos before we might sleep so we can find the next entry.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 3758cf7e14b753838fe754ede3862af10b35fdac upstream.
...otherwise the logic in the timeout handling doesn't work correctly.
Spotted-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 2b9056359889c78ea5decb5b654a512c2e8a945c upstream.
When stopping nfsd, I got BUG messages, and soft lockup messages,
The problem is cuased by double rb_erase() in nfs4_state_destroy_net()
and destroy_client().
This patch just let nfsd traversing unconfirmed client through
hash-table instead of rbtree.
[ 2325.021995] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 2325.022809] IP: [<ffffffff8133c18c>] rb_erase+0x14c/0x390
[ 2325.022982] PGD 7a91b067 PUD 7a33d067 PMD 0
[ 2325.022982] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2325.022982] Modules linked in: nfsd(OF) cfg80211 rfkill bridge stp
llc snd_intel8x0 snd_ac97_codec ac97_bus auth_rpcgss nfs_acl serio_raw
e1000 i2c_piix4 ppdev snd_pcm snd_timer lockd pcspkr joydev parport_pc
snd parport i2c_core soundcore microcode sunrpc ata_generic pata_acpi
[last unloaded: nfsd]
[ 2325.022982] CPU: 1 PID: 2123 Comm: nfsd Tainted: GF O
3.14.0-rc8+ #2
[ 2325.022982] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 2325.022982] task: ffff88007b384800 ti: ffff8800797f6000 task.ti:
ffff8800797f6000
[ 2325.022982] RIP: 0010:[<ffffffff8133c18c>] [<ffffffff8133c18c>]
rb_erase+0x14c/0x390
[ 2325.022982] RSP: 0018:ffff8800797f7d98 EFLAGS: 00010246
[ 2325.022982] RAX: ffff880079c1f010 RBX: ffff880079f4c828 RCX:
0000000000000000
[ 2325.022982] RDX: 0000000000000000 RSI: ffff880079bcb070 RDI:
ffff880079f4c810
[ 2325.022982] RBP: ffff8800797f7d98 R08: 0000000000000000 R09:
ffff88007964fc70
[ 2325.022982] R10: 0000000000000000 R11: 0000000000000400 R12:
ffff880079f4c800
[ 2325.022982] R13: ffff880079bcb000 R14: ffff8800797f7da8 R15:
ffff880079f4c860
[ 2325.022982] FS: 0000000000000000(0000) GS:ffff88007f900000(0000)
knlGS:0000000000000000
[ 2325.022982] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2325.022982] CR2: 0000000000000000 CR3: 000000007a3ef000 CR4:
00000000000006e0
[ 2325.022982] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 2325.022982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 2325.022982] Stack:
[ 2325.022982] ffff8800797f7de0 ffffffffa0191c6e ffff8800797f7da8
ffff8800797f7da8
[ 2325.022982] ffff880079f4c810 ffff880079bcb000 ffffffff81cc26c0
ffff880079c1f010
[ 2325.022982] ffff880079bcb070 ffff8800797f7e28 ffffffffa01977f2
ffff8800797f7df0
[ 2325.022982] Call Trace:
[ 2325.022982] [<ffffffffa0191c6e>] destroy_client+0x32e/0x3b0 [nfsd]
[ 2325.022982] [<ffffffffa01977f2>] nfs4_state_shutdown_net+0x1a2/0x220
[nfsd]
[ 2325.022982] [<ffffffffa01700b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
[ 2325.022982] [<ffffffffa017013e>] nfsd_last_thread+0x4e/0x80 [nfsd]
[ 2325.022982] [<ffffffffa001f1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
[ 2325.022982] [<ffffffffa017064b>] nfsd_destroy+0x5b/0x80 [nfsd]
[ 2325.022982] [<ffffffffa0170773>] nfsd+0x103/0x130 [nfsd]
[ 2325.022982] [<ffffffffa0170670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 2325.022982] [<ffffffff810a8232>] kthread+0xd2/0xf0
[ 2325.022982] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 2325.022982] [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
[ 2325.022982] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 2325.022982] Code: 48 83 e1 fc 48 89 10 0f 84 02 01 00 00 48 3b 41 10
0f 84 08 01 00 00 48 89 51 08 48 89 fa e9 74 ff ff ff 0f 1f 40 00 48 8b
50 10 <f6> 02 01 0f 84 93 00 00 00 48 8b 7a 10 48 85 ff 74 05 f6 07 01
[ 2325.022982] RIP [<ffffffff8133c18c>] rb_erase+0x14c/0x390
[ 2325.022982] RSP <ffff8800797f7d98>
[ 2325.022982] CR2: 0000000000000000
[ 2325.022982] ---[ end trace 28c27ed011655e57 ]---
[ 228.064071] BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:558]
[ 228.064428] Modules linked in: ip6t_rpfilter ip6t_REJECT cfg80211
xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw
ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
iptable_raw nfsd(OF) auth_rpcgss nfs_acl lockd snd_intel8x0
snd_ac97_codec ac97_bus joydev snd_pcm snd_timer e1000 sunrpc snd ppdev
parport_pc serio_raw pcspkr i2c_piix4 microcode parport soundcore
i2c_core ata_generic pata_acpi
[ 228.064539] CPU: 0 PID: 558 Comm: nfsd Tainted: GF O
3.14.0-rc8+ #2
[ 228.064539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 228.064539] task: ffff880076adec00 ti: ffff880074616000 task.ti:
ffff880074616000
[ 228.064539] RIP: 0010:[<ffffffff8133ba17>] [<ffffffff8133ba17>]
rb_next+0x27/0x50
[ 228.064539] RSP: 0018:ffff880074617de0 EFLAGS: 00000282
[ 228.064539] RAX: ffff880074478010 RBX: ffff88007446f860 RCX:
0000000000000014
[ 228.064539] RDX: ffff880074478010 RSI: 0000000000000000 RDI:
ffff880074478010
[ 228.064539] RBP: ffff880074617de0 R08: 0000000000000000 R09:
0000000000000012
[ 228.064539] R10: 0000000000000001 R11: ffffffffffffffec R12:
ffffea0001d11a00
[ 228.064539] R13: ffff88007f401400 R14: ffff88007446f800 R15:
ffff880074617d50
[ 228.064539] FS: 0000000000000000(0000) GS:ffff88007f800000(0000)
knlGS:0000000000000000
[ 228.064539] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 228.064539] CR2: 00007fe9ac6ec000 CR3: 000000007a5d6000 CR4:
00000000000006f0
[ 228.064539] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 228.064539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 228.064539] Stack:
[ 228.064539] ffff880074617e28 ffffffffa01ab7db ffff880074617df0
ffff880074617df0
[ 228.064539] ffff880079273000 ffffffff81cc26c0 ffffffff81cc26c0
0000000000000000
[ 228.064539] 0000000000000000 ffff880074617e48 ffffffffa01840b8
ffffffff81cc26c0
[ 228.064539] Call Trace:
[ 228.064539] [<ffffffffa01ab7db>] nfs4_state_shutdown_net+0x18b/0x220
[nfsd]
[ 228.064539] [<ffffffffa01840b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
[ 228.064539] [<ffffffffa018413e>] nfsd_last_thread+0x4e/0x80 [nfsd]
[ 228.064539] [<ffffffffa00aa1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
[ 228.064539] [<ffffffffa018464b>] nfsd_destroy+0x5b/0x80 [nfsd]
[ 228.064539] [<ffffffffa0184773>] nfsd+0x103/0x130 [nfsd]
[ 228.064539] [<ffffffffa0184670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 228.064539] [<ffffffff810a8232>] kthread+0xd2/0xf0
[ 228.064539] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 228.064539] [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
[ 228.064539] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 228.064539] Code: 1f 44 00 00 55 48 8b 17 48 89 e5 48 39 d7 74 3b 48
8b 47 08 48 85 c0 75 0e eb 25 66 0f 1f 84 00 00 00 00 00 48 89 d0 48 8b
50 10 <48> 85 d2 75 f4 5d c3 66 90 48 3b 78 08 75 f6 48 8b 10 48 89 c7
Fixes: ac55fdc408039 (nfsd: move the confirmed and unconfirmed hlists...)
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 480efaee085235bb848f1063f959bf144103c342 upstream.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 3064639423c48d6e0eb9ecc27c512a58e38c6c57 upstream.
There could be a case, when NFSd file system is mounted in network, different
to socket's one, like below:
"ip netns exec" creates new network and mount namespace, which duplicates NFSd
mount point, created in init_net context. And thus NFS server stop in nested
network context leads to RPCBIND client destruction in init_net.
Then, on NFSd start in nested network context, rpc.nfsd process creates socket
in nested net and passes it into "write_ports", which leads to RPCBIND sockets
creation in init_net context because of the same reason (NFSd monut point was
created in init_net context). An attempt to register passed socket in nested
net leads to panic, because no RPCBIND client present in nexted network
namespace.
This patch add check that passed socket's net matches NFSd superblock's one.
And returns -EINVAL error to user psace otherwise.
v2: Put socket on exit.
Reported-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 9f67f189939eccaa54f3d2c9cf10788abaf2d584 upstream.
Looks like this bug has been here since these write counts were
introduced, not sure why it was just noticed now.
Thanks also to Jan Kara for pointing out the problem.
Reported-by: Matthew Rahtz <mrahtz@rapitasystems.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 04819bf6449094e62cebaf5199d85d68d711e667 upstream.
This fixes an ommission from 18032ca062e621e15683cb61c066ef3dc5414a7b
"NFSD: Server implementation of MAC Labeling", which increased the size
of the setattr error reply without increasing COMPOUND_ERR_SLACK_SPACE.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a11fcce1544df08c723d950ff0edef3adac40405 upstream.
If the entire operation fails then there's nothing to encode.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit de3997a7eeb9ea286b15879fdf8a95aae065b4f7 upstream.
This was an omission from 8c18f2052e756e7d5dea712fc6e7ed70c00e8a39
"nfsd41: SUPPATTR_EXCLCREAT attribute".
Cc: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 4c69d5855a16f7378648c5733632628fa10431db upstream.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit e911b8158ee1def8153849b1641b736026b036e0 upstream.
If we interrupt the nfs4_wait_for_completion_rpc_task() call in
nfs4_run_open_task(), then we don't prevent the RPC call from
completing. So freeing up the opendata->f_attr.mdsthreshold
in the error path in _nfs4_do_open() leads to a use-after-free
when the XDR decoder tries to decode the mdsthreshold information
from the server.
Fixes: 82be417aa37c0 (NFSv4.1 cache mdsthreshold values on OPEN)
Tested-by: Steve Dickson <SteveD@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit a97f4a66d8ee4faf0f31cc4ad3f4aa0baef23fc7 upstream.
When dlm_release_lockspace(ls, 1) is invoked on a busy system
immediately after the last dlm_unlock() AST has finished it can occur
that lkb_idr_is_local() is invoked for the unlocked LKB since removal
from ls_lkbidr only occurs after the AST has returned. If that happens
dlm_release_lockspace(ls, 1) will return -EBUSY instead of releasing
the lockspace. Fix this race condition by changing lkb_idr_is_local()
such that it only returns true for LKB's that have not yet been
unlocked.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 3ead9578443b66ddb3d50ed4f53af8a0c0298ec5 upstream.
@wait is a local variable, so if we don't remove it from the wait queue
list, later wake_up() may end up accessing invalid memory.
This was spotted by eyes.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 13b546d96207c131eeae15dc7b26c6e7d0f1cad7 upstream.
We triggered soft-lockup under stress test on 2.6.34 kernel.
BUG: soft lockup - CPU#1 stuck for 60009ms! [lockf2.test:14488]
...
[<bf09a4d4>] (jffs2_do_reserve_space+0x420/0x440 [jffs2])
[<bf09a528>] (jffs2_reserve_space_gc+0x34/0x78 [jffs2])
[<bf0a1350>] (jffs2_garbage_collect_dnode.isra.3+0x264/0x478 [jffs2])
[<bf0a2078>] (jffs2_garbage_collect_pass+0x9c0/0xe4c [jffs2])
[<bf09a670>] (jffs2_reserve_space+0x104/0x2a8 [jffs2])
[<bf09dc48>] (jffs2_write_inode_range+0x5c/0x4d4 [jffs2])
[<bf097d8c>] (jffs2_write_end+0x198/0x2c0 [jffs2])
[<c00e00a4>] (generic_file_buffered_write+0x158/0x200)
[<c00e14f4>] (__generic_file_aio_write+0x3a4/0x414)
[<c00e15c0>] (generic_file_aio_write+0x5c/0xbc)
[<c012334c>] (do_sync_write+0x98/0xd4)
[<c0123a84>] (vfs_write+0xa8/0x150)
[<c0123d74>] (sys_write+0x3c/0xc0)]
Fix this by adding a cond_resched() in the while loop.
[akpm@linux-foundation.org: don't initialize `ret']
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 41bf1a24c1001f4d0d41a78e1ac575d2f14789d7 upstream.
mounting JFFS2 partition sometimes crashes with this call trace:
[ 1322.240000] Kernel bug detected[#1]:
[ 1322.244000] Cpu 2
[ 1322.244000] $ 0 : 0000000000000000 0000000000000018 000000003ff00070 0000000000000001
[ 1322.252000] $ 4 : 0000000000000000 c0000000f3980150 0000000000000000 0000000000010000
[ 1322.260000] $ 8 : ffffffffc09cd5f8 0000000000000001 0000000000000088 c0000000ed300de8
[ 1322.268000] $12 : e5e19d9c5f613a45 ffffffffc046d464 0000000000000000 66227ba5ea67b74e
[ 1322.276000] $16 : c0000000f1769c00 c0000000ed1e0200 c0000000f3980150 0000000000000000
[ 1322.284000] $20 : c0000000f3a80000 00000000fffffffc c0000000ed2cfbd8 c0000000f39818f0
[ 1322.292000] $24 : 0000000000000004 0000000000000000
[ 1322.300000] $28 : c0000000ed2c0000 c0000000ed2cfab8 0000000000010000 ffffffffc039c0b0
[ 1322.308000] Hi : 000000000000023c
[ 1322.312000] Lo : 000000000003f802
[ 1322.316000] epc : ffffffffc039a9f8 check_tn_node+0x88/0x3b0
[ 1322.320000] Not tainted
[ 1322.324000] ra : ffffffffc039c0b0 jffs2_do_read_inode_internal+0x1250/0x1e48
[ 1322.332000] Status: 5400f8e3 KX SX UX KERNEL EXL IE
[ 1322.336000] Cause : 00800034
[ 1322.340000] PrId : 000c1004 (Netlogic XLP)
[ 1322.344000] Modules linked in:
[ 1322.348000] Process jffs2_gcd_mtd7 (pid: 264, threadinfo=c0000000ed2c0000, task=c0000000f0e68dd8, tls=0000000000000000)
[ 1322.356000] Stack : c0000000f1769e30 c0000000ed010780 c0000000ed010780 c0000000ed300000
c0000000f1769c00 c0000000f3980150 c0000000f3a80000 00000000fffffffc
c0000000ed2cfbd8 ffffffffc039c0b0 ffffffffc09c6340 0000000000001000
0000000000000dec ffffffffc016c9d8 c0000000f39805a0 c0000000f3980180
0000008600000000 0000000000000000 0000000000000000 0000000000000000
0001000000000dec c0000000f1769d98 c0000000ed2cfb18 0000000000010000
0000000000010000 0000000000000044 c0000000f3a80000 c0000000f1769c00
c0000000f3d207a8 c0000000f1769d98 c0000000f1769de0 ffffffffc076f9c0
0000000000000009 0000000000000000 0000000000000000 ffffffffc039cf90
0000000000000017 ffffffffc013fbdc 0000000000000001 000000010003e61c
...
[ 1322.424000] Call Trace:
[ 1322.428000] [<ffffffffc039a9f8>] check_tn_node+0x88/0x3b0
[ 1322.432000] [<ffffffffc039c0b0>] jffs2_do_read_inode_internal+0x1250/0x1e48
[ 1322.440000] [<ffffffffc039cf90>] jffs2_do_crccheck_inode+0x70/0xd0
[ 1322.448000] [<ffffffffc03a1b80>] jffs2_garbage_collect_pass+0x160/0x870
[ 1322.452000] [<ffffffffc03a392c>] jffs2_garbage_collect_thread+0xdc/0x1f0
[ 1322.460000] [<ffffffffc01541c8>] kthread+0xb8/0xc0
[ 1322.464000] [<ffffffffc0106d18>] kernel_thread_helper+0x10/0x18
[ 1322.472000]
[ 1322.472000]
Code: 67bd0050 94a4002c 2c830001 <00038036> de050218 2403fffc 0080a82d 00431824 24630044
[ 1322.480000] ---[ end trace b052bb90e97dfbf5 ]---
The variable csize in structure jffs2_tmp_dnode_info is of type uint16_t, but it
is used to hold the compressed data length(csize) which is declared as uint32_t.
So, when the value of csize exceeds 16bits, it gets truncated when assigned to
tn->csize. This is causing a kernel BUG.
Changing the definition of csize in jffs2_tmp_dnode_info to uint32_t fixes the issue.
Signed-off-by: Ajesh Kunhipurayil Vijayan <ajesh@broadcom.com>
Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 3367da5610c50e6b83f86d366d72b41b350b06a2 upstream.
Creating a large file on a JFFS2 partition sometimes crashes with this call
trace:
[ 306.476000] CPU 13 Unable to handle kernel paging request at virtual address c0000000dfff8002, epc == ffffffffc03a80a8, ra == ffffffffc03a8044
[ 306.488000] Oops[#1]:
[ 306.488000] Cpu 13
[ 306.492000] $ 0 : 0000000000000000 0000000000000000 0000000000008008 0000000000008007
[ 306.500000] $ 4 : c0000000dfff8002 000000000000009f c0000000e0007cde c0000000ee95fa58
[ 306.508000] $ 8 : 0000000000000001 0000000000008008 0000000000010000 ffffffffffff8002
[ 306.516000] $12 : 0000000000007fa9 000000000000ff0e 000000000000ff0f 80e55930aebb92bb
[ 306.524000] $16 : c0000000e0000000 c0000000ee95fa5c c0000000efc80000 ffffffffc09edd70
[ 306.532000] $20 : ffffffffc2b60000 c0000000ee95fa58 0000000000000000 c0000000efc80000
[ 306.540000] $24 : 0000000000000000 0000000000000004
[ 306.548000] $28 : c0000000ee950000 c0000000ee95f738 0000000000000000 ffffffffc03a8044
[ 306.556000] Hi : 00000000000574a5
[ 306.560000] Lo : 6193b7a7e903d8c9
[ 306.564000] epc : ffffffffc03a80a8 jffs2_rtime_compress+0x98/0x198
[ 306.568000] Tainted: G W
[ 306.572000] ra : ffffffffc03a8044 jffs2_rtime_compress+0x34/0x198
[ 306.580000] Status: 5000f8e3 KX SX UX KERNEL EXL IE
[ 306.584000] Cause : 00800008
[ 306.588000] BadVA : c0000000dfff8002
[ 306.592000] PrId : 000c1100 (Netlogic XLP)
[ 306.596000] Modules linked in:
[ 306.596000] Process dd (pid: 170, threadinfo=c0000000ee950000, task=c0000000ee6e0858, tls=0000000000c47490)
[ 306.608000] Stack : 7c547f377ddc7ee4 7ffc7f967f5d7fae 7f617f507fc37ff4 7e7d7f817f487f5f
7d8e7fec7ee87eb3 7e977ff27eec7f9e 7d677ec67f917f67 7f3d7e457f017ed7
7fd37f517f867eb2 7fed7fd17ca57e1d 7e5f7fe87f257f77 7fd77f0d7ede7fdb
7fba7fef7e197f99 7fde7fe07ee37eb5 7f5c7f8c7fc67f65 7f457fb87f847e93
7f737f3e7d137cd9 7f8e7e9c7fc47d25 7dbb7fac7fb67e52 7ff17f627da97f64
7f6b7df77ffa7ec5 80057ef17f357fb3 7f767fa27dfc7fd5 7fe37e8e7fd07e53
7e227fcf7efb7fa1 7f547e787fa87fcc 7fcb7fc57f5a7ffb 7fc07f6c7ea97e80
7e2d7ed17e587ee0 7fb17f9d7feb7f31 7f607e797e887faa 7f757fdd7c607ff3
7e877e657ef37fbd 7ec17fd67fe67ff7 7ff67f797ff87dc4 7eef7f3a7c337fa6
7fe57fc97ed87f4b 7ebe7f097f0b8003 7fe97e2a7d997cba 7f587f987f3c7fa9
...
[ 306.676000] Call Trace:
[ 306.680000] [<ffffffffc03a80a8>] jffs2_rtime_compress+0x98/0x198
[ 306.684000] [<ffffffffc0394f10>] jffs2_selected_compress+0x110/0x230
[ 306.692000] [<ffffffffc039508c>] jffs2_compress+0x5c/0x388
[ 306.696000] [<ffffffffc039dc58>] jffs2_write_inode_range+0xd8/0x388
[ 306.704000] [<ffffffffc03971bc>] jffs2_write_end+0x16c/0x2d0
[ 306.708000] [<ffffffffc01d3d90>] generic_file_buffered_write+0xf8/0x2b8
[ 306.716000] [<ffffffffc01d4e7c>] __generic_file_aio_write+0x1ac/0x350
[ 306.720000] [<ffffffffc01d50a0>] generic_file_aio_write+0x80/0x168
[ 306.728000] [<ffffffffc021f7dc>] do_sync_write+0x94/0xf8
[ 306.732000] [<ffffffffc021ff6c>] vfs_write+0xa4/0x1a0
[ 306.736000] [<ffffffffc02202e8>] SyS_write+0x50/0x90
[ 306.744000] [<ffffffffc0116cc0>] handle_sys+0x180/0x1a0
[ 306.748000]
[ 306.748000]
Code: 020b202d 0205282d 90a50000 <90840000> 14a40038 00000000 0060602d 0000282d 016c5823
[ 306.760000] ---[ end trace 79dd088435be02d0 ]---
Segmentation fault
This crash is caused because the 'positions' is declared as an array of signed
short. The value of position is in the range 0..65535, and will be converted
to a negative number when the position is greater than 32767 and causes a
corruption and crash. Changing the definition to 'unsigned short' fixes this
issue
Signed-off-by: Jayachandran C <jchandra@broadcom.com>
Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit ad6599ab3ac98a4474544086e048ce86ec15a4d1 upstream.
Xfstests generic/311 and shared/298 fail when run on a bigalloc file
system. Kernel error messages produced during the tests report that
blocks to be freed are already on the to-be-freed list. When e2fsck
is run at the end of the tests, it typically reports bad i_blocks and
bad free blocks counts.
The bug that causes these failures is located in ext4_ext_rm_leaf().
Code at the end of the function frees a partial cluster if it's not
shared with an extent remaining in the leaf. However, if all the
extents in the leaf have been removed, the code dereferences an
invalid extent pointer (off the front of the leaf) when the check for
sharing is made. This generally has the effect of unconditionally
freeing the partial cluster, which leads to the observed failures
when the partial cluster is shared with the last extent in the next
leaf.
Fix this by attempting to free the cluster only if extents remain in
the leaf. Any remaining partial cluster will be freed if possible
when the next leaf is processed or when leaf removal is complete.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c06344939422bbd032ac967223a7863de57496b5 upstream.
Commit 9cb00419fa, which enables hole punching for bigalloc file
systems, exposed a bug introduced by commit 6ae06ff51e in an earlier
release. When run on a bigalloc file system, xfstests generic/013, 068,
075, 083, 091, 100, 112, 127, 263, 269, and 270 fail with e2fsck errors
or cause kernel error messages indicating that previously freed blocks
are being freed again.
The latter commit optimizes the selection of the starting extent in
ext4_ext_rm_leaf() when hole punching by beginning with the extent
supplied in the path argument rather than with the last extent in the
leaf node (as is still done when truncating). However, the code in
rm_leaf that initially sets partial_cluster to track cluster sharing on
extent boundaries is only guaranteed to run if rm_leaf starts with the
last node in the leaf. Consequently, partial_cluster is not correctly
initialized when hole punching, and a cluster on the boundary of a
punched region that should be retained may instead be deallocated.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit ce37c42919608e96ade3748fe23c3062a0a966c5 upstream.
Commit 3779473246 breaks the return of error codes from
ext4_ext_handle_uninitialized_extents() in ext4_ext_map_blocks(). A
portion of the patch assigns that function's signed integer return
value to an unsigned int. Consequently, negatively valued error codes
are lost and can be treated as a bogus allocated block count.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 3bbb24b20a8800158c33eca8564f432dd14d0bf3 upstream.
Zach found this deadlock that would happen like this
btrfs_end_transaction <- reduce trans->use_count to 0
btrfs_run_delayed_refs
btrfs_cow_block
find_free_extent
btrfs_start_transaction <- increase trans->use_count to 1
allocate chunk
btrfs_end_transaction <- decrease trans->use_count to 0
btrfs_run_delayed_refs
lock tree block we are cowing above ^^
We need to only decrease trans->use_count if it is above 1, otherwise leave it
alone. This will make nested trans be the only ones who decrease their added
ref, and will let us get rid of the trans->use_count++ hack if we have to commit
the transaction. Thanks,
Reported-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Tested-by: Zach Brown <zab@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit f88ba6a2a44ee98e8d59654463dc157bb6d13c43 upstream.
I got an error on v3.13:
BTRFS error (device sdf1) in write_all_supers:3378: errno=-5 IO failure (errors while submitting device barriers.)
how to reproduce:
> mkfs.btrfs -f -d raid1 /dev/sdf1 /dev/sdf2
> wipefs -a /dev/sdf2
> mount -o degraded /dev/sdf1 /mnt
> btrfs balance start -f -sconvert=single -mconvert=single -dconvert=single /mnt
The reason of the error is that barrier_all_devices() failed to submit
barrier to the missing device. However it is clear that we cannot do
anything on missing device, and also it is not necessary to care chunks
on the missing device.
This patch stops sending/waiting barrier if device is missing.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c88547a8119e3b581318ab65e9b72f27f23e641d upstream.
Commit f5ea1100 ("xfs: add CRCs to dir2/da node blocks") introduced
in 3.10 incorrectly converted the btree hash index array pointer in
xfs_da3_fixhashpath(). It resulted in the the current hash always
being compared against the first entry in the btree rather than the
current block index into the btree block's hash entry array. As a
result, it was comparing the wrong hashes, and so could misorder the
entries in the btree.
For most cases, this doesn't cause any problems as it requires hash
collisions to expose the ordering problem. However, when there are
hash collisions within a directory there is a very good probability
that the entries will be ordered incorrectly and that actually
matters when duplicate hashes are placed into or removed from the
btree block hash entry array.
This bug results in an on-disk directory corruption and that results
in directory verifier functions throwing corruption warnings into
the logs. While no data or directory entries are lost, access to
them may be compromised, and attempts to remove entries from a
directory that has suffered from this corruption may result in a
filesystem shutdown. xfs_repair will fix the directory hash
ordering without data loss occuring.
[dchinner: wrote useful a commit message]
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 5acda9d12dcf1ad0d9a5a2a7c646de3472fa7555 upstream.
After commit 839a8e8660b6 ("writeback: replace custom worker pool
implementation with unbound workqueue") when device is removed while we
are writing to it we crash in bdi_writeback_workfn() ->
set_worker_desc() because bdi->dev is NULL.
This can happen because even though bdi_unregister() cancels all pending
flushing work, nothing really prevents new ones from being queued from
balance_dirty_pages() or other places.
Fix the problem by clearing BDI_registered bit in bdi_unregister() and
checking it before scheduling of any flushing work.
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Derek Basehore <dbasehore@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|