summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2012-10-10vfs: dcache: fix deadlock in tree traversalMiklos Szeredi
commit 8110e16d42d587997bcaee0c864179e6d93603fe upstream. IBM reported a deadlock in select_parent(). This was found to be caused by taking rename_lock when already locked when restarting the tree traversal. There are two cases when the traversal needs to be restarted: 1) concurrent d_move(); this can only happen when not already locked, since taking rename_lock protects against concurrent d_move(). 2) racing with final d_put() on child just at the moment of ascending to parent; rename_lock doesn't protect against this rare race, so it can happen when already locked. Because of case 2, we need to be able to handle restarting the traversal when rename_lock is already held. This patch fixes all three callers of try_to_ascend(). IBM reported that the deadlock is gone with this patch. [ I rewrote the patch to be smaller and just do the "goto again" if the lock was already held, but credit goes to Miklos for the real work. - Linus ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-10cifs: fix return value in cifsConvertToUTF16Jeff Layton
commit c73f693989d7a7d99ec66a7065295a0c93d0b127 upstream. This function returns the wrong value, which causes the callers to get the length of the resulting pathname wrong when it contains non-ASCII characters. This seems to fix https://bugzilla.samba.org/show_bug.cgi?id=6767 Reported-by: Baldvin Kovacs <baldvin.kovacs@gmail.com> Reported-and-Tested-by: Nicolas Lefebvre <nico.lefebvre@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-10vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()Miklos Szeredi
commit b161dfa6937ae46d50adce8a7c6b12233e96e7bd upstream. IBM reported a soft lockup after applying the fix for the rename_lock deadlock. Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression in kernel 2.6.38") was found to be the culprit. The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the dentry was killed. This flag can be set on non-killed dentries too, which results in infinite retries when trying to traverse the dentry tree. This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is only set in d_kill() and makes try_to_ascend() test only this flag. IBM reported successful test results with this patch. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-10fs/proc: fix potential unregister_sysctl_table hangFrancesco Ruggeri
commit 6bf6104573482570f7103d3e5ddf9574db43a363 upstream. The unregister_sysctl_table() function hangs if all references to its ctl_table_header structure are not dropped. This can happen sometimes because of a leak in proc_sys_lookup(): proc_sys_lookup() gets a reference to the table via lookup_entry(), but it does not release it when a subsequent call to sysctl_follow_link() fails. This patch fixes this leak by making sure the reference is always dropped on return. See also commit 076c3eed2c31 ("sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry") which reorganized this code in 3.4. Tested in Linux 3.4.4. Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19vfs: make O_PATH file descriptors usable for 'fstat()'Linus Torvalds
commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2 upstream. We already use them for openat() and friends, but fstat() also wants to be able to use O_PATH file descriptors. This should make it more directly comparable to the O_SEARCH of Solaris. Note that you could already do the same thing with "fstatat()" and an empty path, but just doing "fstat()" directly is simpler and faster, so there is no reason not to just allow it directly. See also commit 332a2e1244bd, which did the same thing for fchdir, for the same reasons. Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19VFS: make vfs_fstat() use f[get|put]_light()Linus Torvalds
commit e994defb7b6813ba6fa7a2a36e86d2455ad1dc35 upstream. Use the *_light() versions that properly avoid doing the file user count updates when they are unnecessary. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19eCryptfs: Copy up attributes of the lower target inode after renameTyler Hicks
commit 8335eafc2859e1a26282bef7c3d19f3d68868b8a upstream. After calling into the lower filesystem to do a rename, the lower target inode's attributes were not copied up to the eCryptfs target inode. This resulted in the eCryptfs target inode staying around, rather than being evicted, because i_nlink was not updated for the eCryptfs inode. This also meant that eCryptfs didn't do the final iput() on the lower target inode so it stayed around, as well. This would result in a failure to free up space occupied by the target file in the rename() operation. Both target inodes would eventually be evicted when the eCryptfs filesystem was unmounted. This patch calls fsstack_copy_attr_all() after the lower filesystem does its ->rename() so that important inode attributes, such as i_nlink, are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called and eCryptfs can drop its final reference on the lower inode. http://launchpad.net/bugs/561129 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19NFS: return error from decode_getfh in decode openWeston Andros Adamson
commit 01913b49cf1dc6409a07dd2a4cc6af2e77f3c410 upstream. If decode_getfh failed, nfs4_xdr_dec_open would return 0 since the last decode_* call must have succeeded. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19NFS: Fix a problem with the legacy binary mount codeTrond Myklebust
commit 872ece86ea5c367aa92f44689c2d01a1c767aeb3 upstream. Apparently, am-utils is still using the legacy binary mountdata interface, and is having trouble parsing /proc/mounts due to the 'port=' field being incorrectly set. The following patch should fix up the regression. Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19NFS: Fix the initialisation of the readdir 'cookieverf' arrayTrond Myklebust
commit c3f52af3e03013db5237e339c817beaae5ec9e3a upstream. When the NFS_COOKIEVERF helper macro was converted into a static inline function in commit 99fadcd764 (nfs: convert NFS_*(inode) helpers to static inline), we broke the initialisation of the readdir cookies, since that depended on doing a memset with an argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *). At this point, NFS_COOKIEVERF seems to be more of an obfuscation than a helper, so the best thing would be to just get rid of it. Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881 Reported-by: Andi Kleen <andi@firstfloor.org> Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19CIFS: Fix error handling in cifs_push_mandatory_locksPavel Shilovsky
commit e2f2886a824ff0a56da1eaa13019fde86aa89fa6 upstream. Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19udf: Fix data corruption for files in ICBJan Kara
commit 9c2fc0de1a6e638fe58c354a463f544f42a90a09 upstream. When a file is stored in ICB (inode), we overwrite part of the file, and the page containing file's data is not in page cache, we end up corrupting file's data by overwriting them with zeros. The problem is we use simple_write_begin() which simply zeroes parts of the page which are not written to. The problem has been introduced by be021ee4 (udf: convert to new aops). Fix the problem by providing a ->write_begin function which makes the page properly uptodate. Reported-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19fuse: fix retrieve lengthMiklos Szeredi
commit c9e67d483776d8d2a5f3f70491161b205930ffe1 upstream. In some cases fuse_retrieve() would return a short byte count if offset was non-zero. The data returned was correct, though. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19ext3: Fix fdatasync() for files with only i_size changesJan Kara
commit 156bddd8e505b295540f3ca0e27dda68cb0d49aa upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org> Signed-off-by: Jan Kara <jack@suse.cz> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12Squashfs: fix mount time sanity check for corrupted superblockPhillip Lougher
commit cc37f75a9ffbbfcb1c3297534f293c8284e3c5a6 upstream. A Squashfs filesystem containing nothing but an empty directory, although unusual and ultimately pointless, is still valid. The directory_table >= next_table sanity check rejects these filesystems as invalid because the directory_table is empty and equal to next_table. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12NFS: Fix Oopses in nfs_lookup_revalidate and nfs4_lookup_revalidateTrond Myklebust
Fix the following Oops in 3.5.1: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs] PGD 337c63067 PUD 0 Oops: 0000 [#1] SMP CPU 5 Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc af_packet binfmt_misc cpufreq_conservative cpufreq_userspace cpufreq_powersave dm_mod acpi_cpufreq mperf coretemp gpio_ich kvm_intel joydev kvm ioatdma hid_generic igb lpc_ich i7core_edac edac_core ptp serio_raw dca pcspkr i2c_i801 mfd_core sg pps_core usbhid crc32c_intel microcode button autofs4 uhci_hcd ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea ehci_hcd usbcore usb_common scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh edd fan ata_piix thermal processor thermal_sys Pid: 30431, comm: java Not tainted 3.5.1-2-default #1 Supermicro X8DTT/X8DTT RIP: 0010:[<ffffffffa03789cd>] [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs] RSP: 0018:ffff8801b418bd38 EFLAGS: 00010292 RAX: 00000000fffffff6 RBX: ffff88032016d800 RCX: 0000000000000020 RDX: ffffffff00000000 RSI: 0000000000000000 RDI: ffff8801824a7b00 RBP: ffff8801b418bdf8 R08: 7fffff0034323030 R09: fffffffff04c03ed R10: ffff8801824a7b00 R11: 0000000000000002 R12: ffff8801824a7b00 R13: ffff8801824a7b00 R14: 0000000000000000 R15: ffff8803201725d0 FS: 00002b53a46cb700(0000) GS:ffff88033fc20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 000000020a426000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process java (pid: 30431, threadinfo ffff8801b418a000, task ffff8801b5d20600) Stack: ffff8801b418be44 ffff88032016d800 ffff8801b418bdf8 0000000000000000 ffff8801824a7b00 ffff8801b418bdd7 ffff8803201725d0 ffffffff8116a9c0 ffff8801b5c38dc0 0000000000000007 ffff88032016d800 0000000000000000 Call Trace: [<ffffffff8116a9c0>] lookup_dcache+0x80/0xe0 [<ffffffff8116aa43>] __lookup_hash+0x23/0x90 [<ffffffff8116b4a5>] lookup_one_len+0xc5/0x100 [<ffffffffa03869a3>] nfs_sillyrename+0xe3/0x210 [nfs] [<ffffffff8116cadf>] vfs_unlink.part.25+0x7f/0xe0 [<ffffffff8116f22c>] do_unlinkat+0x1ac/0x1d0 [<ffffffff815717b9>] system_call_fastpath+0x16/0x1b [<00002b5348b5f527>] 0x2b5348b5f526 Code: ec 38 b8 f6 ff ff ff 4c 89 64 24 18 4c 89 74 24 28 49 89 fc 48 89 5c 24 08 48 89 6c 24 10 49 89 f6 4c 89 6c 24 20 4c 89 7c 24 30 <f6> 46 38 40 0f 85 d1 00 00 00 e8 c4 c4 df e0 48 8b 58 30 49 89 RIP [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs] RSP <ffff8801b418bd38> CR2: 0000000000000038 ---[ end trace 845113ed191985dd ]--- This Oops affects 3.5 kernels and older, and is due to lookup_one_len() calling down to the dentry revalidation code with a NULL pointer to struct nameidata. It is fixed upstream by commit 0b728e1911c (stop passing nameidata * to ->d_revalidate()) Reported-by: Richard Ems <richard.ems@cape-horn-eng.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12block: replace __getblk_slow misfix by grow_dev_page fixHugh Dickins
commit 676ce6d5ca3098339c028d44fe0427d1566a4d2d upstream. Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow") is not good: a successful call to grow_buffers() cannot guarantee that the page won't be reclaimed before the immediate next call to __find_get_block(), which is why there was always a loop there. Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595: inode #19278: block 664: comm cc1: unable to read itable block" on console, which pointed to this commit. I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs sometimes fails from a missing header file, under memory pressure on ppc G5. I've never seen this on x86, and I've never seen it on 3.5-rc7 itself, despite that commit being in there: bisection pointed to an irrelevant pinctrl merge, but hard to tell when failure takes between 18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2). (I've since found such __ext4_get_inode_loc errors in /var/log/messages from previous weeks: why the message never appeared on console until yesterday morning is a mystery for another day.) Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus a checkpatch nitfix). Simplify the interface between grow_buffers() and grow_dev_page(), and avoid the infinite loop beyond end of device by instead checking init_page_buffers()'s end_block there (I presume that's more efficient than a repeated call to blkdev_max_block()), returning -ENXIO to __getblk_slow() in that case. And remove akpm's ten-year-old "__getblk() cannot fail ... weird" comment, but that is worrying: are all users of __getblk() really now prepared for a NULL bh beyond end of device, or will some oops?? Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12fs/buffer.c: remove BUG() in possible but rare conditionGlauber Costa
commit 61065a30af8df4b8989c2ac7a1f4b4034e4df2d5 upstream. While stressing the kernel with with failing allocations today, I hit the following chain of events: alloc_page_buffers(): bh = alloc_buffer_head(GFP_NOFS); if (!bh) goto no_grow; <= path taken grow_dev_page(): bh = alloc_page_buffers(page, size, 0); if (!bh) goto failed; <= taken, consequence of the above and then the failed path BUG()s the kernel. The failure is inserted a litte bit artificially, but even then, I see no reason why it should be deemed impossible in a real box. Even though this is not a condition that we expect to see around every time, failed allocations are expected to be handled, and BUG() sounds just too much. As a matter of fact, grow_dev_page() can return NULL just fine in other circumstances, so I propose we just remove it, then. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12vfs: missed source of ->f_pos racesAl Viro
commit 0e665d5d1125f9f4ccff56a75e814f10f88861a2 upstream. compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing as sys_{read,write}{,v}(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12NFSv3: Ensure that do_proc_get_root() reports errors correctlyTrond Myklebust
commit 086600430493e04b802bee6e5b3ce0458e4eb77f upstream. If the rpc call to NFS3PROC_FSINFO fails, then we need to report that error so that the mount fails. Otherwise we can end up with a superblock with completely unusable values for block sizes, maxfilesize, etc. Reported-by: Yuanming Chen <hikvision_linux@163.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12ext4: fix long mount times on very big file systemsTheodore Ts'o
commit 0548bbb85337e532ca2ed697c3e9b227ff2ed4b4 upstream. Commit 8aeb00ff85a: "ext4: fix overhead calculation used by ext4_statfs()" introduced a O(n**2) calculation which makes very large file systems take forever to mount. Fix this with an optimization for non-bigalloc file systems. (For bigalloc file systems the overhead needs to be set in the the superblock.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12NFS: Alias the nfs module to nfs4bjschuma@gmail.com
commit 425e776d93a7a5070b77d4f458a5bab0f924652c upstream. This allows distros to remove the line from their modprobe configuration. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12vfs: canonicalize create mode in build_open_flags()Miklos Szeredi
commit e68726ff72cf7ba5e7d789857fcd9a75ca573f03 upstream. Userspace can pass weird create mode in open(2) that we canonicalize to "(mode & S_IALLUGO) | S_IFREG" in vfs_create(). The problem is that we use the uncanonicalized mode before calling vfs_create() with unforseen consequences. So do the canonicalization early in build_open_flags(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12fuse: verify all ioctl retry iov elementsZach Brown
commit fb6ccff667712c46b4501b920ea73a326e49626a upstream. Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that the total iovec from the client doesn't overflow iov_length() but it only checked the first element. The iovec could still overflow by starting with a small element. The obvious fix is to check all the elements. The overflow case doesn't look dangerous to the kernel as the copy is limited by the length after the overflow. This fix restores the intention of returning an error instead of successfully copying less than the iovec represented. I found this by code inspection. I built it but don't have a test case. I'm cc:ing stable because the initial commit did as well. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12ext4: avoid kmemcheck complaint from reading uninitialized memoryTheodore Ts'o
commit 7e731bc9a12339f344cddf82166b82633d99dd86 upstream. Commit 03179fe923 introduced a kmemcheck complaint in ext4_da_get_block_prep() because we save and restore ei->i_da_metadata_calc_last_lblock even though it is left uninitialized in the case where i_da_metadata_calc_len is zero. This doesn't hurt anything, but silencing the kmemcheck complaint makes it easier for people to find real bugs. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631 (which is marked as a regression). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12pnfs: defer release of pages in layoutgetIdan Kedar
commit 8554116e17eef055d9dd58a94b3427cb2ad1c317 upstream. we have encountered a bug whereby reading a lot of files (copying fedora's /bin) from a pNFS mount and hitting Ctrl+C in the middle caused a general protection fault in xdr_shrink_bufhead. this function is called when decoding the response from LAYOUTGET. the decoding is done by a worker thread, and the caller of LAYOUTGET waits for the worker thread to complete. hitting Ctrl+C caused the synchronous wait to end and the next thing the caller does is to free the pages, so when the worker thread calls xdr_shrink_bufhead, the pages are gone. therefore, the cleanup of these pages has been moved to nfs4_layoutget_release. Signed-off-by: Idan Kedar <idank@tonian.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12fix page number calculation bug for block layout decode bufferJim Rees
commit 10bd295a0b6488ebe634b72a11d8986bd3af3819 upstream. Signed-off-by: Jim Rees <rees@umich.edu> Suggested-by: Andy Adamson <andros@netapp.com> Suggested-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12NFSv4.1 fix page number calculation bug for filelayout decode buffersAndy Adamson
commit e5265a0c587423bbd21a6b39a572cecff16b9346 upstream. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12NFS: Use kcalloc() when allocating arraysTrond Myklebust
commit 7d9dea915fe333357912bce2d624ee848dfbd890 upstream. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-12nfs: tear down caches in nfs_init_writepagecache when allocation failsJeff Layton
commit 3dd4765fce04c0b4af1e0bc4c0b10f906f95fabc upstream. ...and ensure that we tear down the nfs_commit_data cache too when unloading the module. Cc: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: drop the nfs_cdata_cachep cleanup; it doesn't exist] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-19hfsplus: fix overflow in sector calculations in hfsplus_submit_bioJanne Kalliomäki
commit a6dc8c04218eb752ff79cdc24a995cf51866caed upstream. The variable io_size was unsigned int, which caused the wrong sector number to be calculated after aligning it. This then caused mount to fail with big volumes, as backup volume header information was searched from a wrong sector. Signed-off-by: Janne Kalliomäki <janne@tuxera.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-10ore: Fix out-of-bounds access in _ios_obj()Boaz Harrosh
commit 9e62bb4458ad2cf28bd701aa5fab380b846db326 upstream. _ios_obj() is accessed by group_index not device_table index. The oc->comps array is only a group_full of devices at a time it is not like ore_comp_dev() which is indexed by a global device_table index. This did not BUG until now because exofs only uses a single COMP for all devices. But with other FSs like PanFS this is not true. This bug was only in the write_path, all other users were using it correctly [This is a bug since 3.2 Kernel] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-10nilfs2: fix deadlock issue between chcp and thaw ioctlsRyusuke Konishi
commit 572d8b3945a31bee7c40d21556803e4807fd9141 upstream. An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-10nfs: skip commit in releasepage if we're freeing memory for fs-related reasonsJeff Layton
commit 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-10nfsd4: our filesystems are normally case sensitiveJ. Bruce Fields
commit 2930d381d22b9c56f40dd4c63a8fa59719ca2c3c upstream. Actually, xfs and jfs can optionally be case insensitive; we'll handle that case in later patches. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02Btrfs: call the ordered free operation without any locks heldChris Mason
commit e9fbcb42201c862fd6ab45c48ead4f47bb2dea9d upstream. Each ordered operation has a free callback, and this was called with the worker spinlock held. Josef made the free callback also call iput, which we can't do with the spinlock. This drops the spinlock for the free operation and grabs it again before moving through the rest of the list. We'll circle back around to this and find a cleaner way that doesn't bounce the lock around so much. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02locks: fix checking of fcntl_setlease argumentJ. Bruce Fields
commit 0ec4f431eb56d633da3a55da67d5c4b88886ccc7 upstream. The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.) are done after converting the long to an int. Thus some illegal values may be let through and cause problems in later code. [ They actually *don't* cause problems in mainline, as of Dave Jones's commit 8d657eb3b438 "Remove easily user-triggerable BUG from generic_setlease", but we should fix this anyway. And this patch will be necessary to fix real bugs on earlier kernels. ] Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02ext4: undo ext4_calc_metadata_amount if we fail to claim spaceTheodore Ts'o
commit 03179fe92318e7934c180d96f12eff2cb36ef7b6 upstream. The function ext4_calc_metadata_amount() has side effects, although it's not obvious from its function name. So if we fail to claim space, regardless of whether we retry to claim the space again, or return an error, we need to undo these side effects. Otherwise we can end up incorrectly calculating the number of metadata blocks needed for the operation, which was responsible for an xfstests failure for test #271 when using an ext2 file system with delalloc enabled. Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02ext4: don't let i_reserved_meta_blocks go negativeBrian Foster
commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream. If we hit a condition where we have allocated metadata blocks that were not appropriately reserved, we risk underflow of ei->i_reserved_meta_blocks. In turn, this can throw sbi->s_dirtyclusters_counter significantly out of whack and undermine the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this occurs and set i_allocated_meta_blocks to avoid this problem. This condition is reproduced by xfstests 270 against ext2 with delalloc enabled: Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28 Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost 270 ultimately fails with an inconsistent filesystem and requires an fsck to repair. The cause of the error is an underflow in ext4_da_update_reserve_space() due to an unreserved meta block allocation. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02udf: Improve table length check to avoid possible overflowJan Kara
commit 57b9655d01ef057a523e810d29c37ac09b80eead upstream. When a partition table length is corrupted to be close to 1 << 32, the check for its length may overflow on 32-bit systems and we will think the length is valid. Later on the kernel can crash trying to read beyond end of buffer. Fix the check to avoid possible overflow. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02ext4: fix overhead calculation used by ext4_statfs()Theodore Ts'o
commit 952fc18ef9ec707ebdc16c0786ec360295e5ff15 upstream. Commit f975d6bcc7a introduced bug which caused ext4_statfs() to miscalculate the number of file system overhead blocks. This causes the f_blocks field in the statfs structure to be larger than it should be. This would in turn cause the "df" output to show the number of data blocks in the file system and the number of data blocks used to be larger than they should be. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02ext4: pass a char * to ext4_count_free() instead of a buffer_head ptrTheodore Ts'o
commit f6fb99cadcd44660c68e13f6eab28333653621e6 upstream. Make it possible for ext4_count_free to operate on buffers and not just data in buffer_heads. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmapsJeff Layton
commit 3cf003c08be785af4bee9ac05891a15bcbff856a upstream. Jian found that when he ran fsx on a 32 bit arch with a large wsize the process and one of the bdi writeback kthreads would sometimes deadlock with a stack trace like this: crash> bt PID: 2789 TASK: f02edaa0 CPU: 3 COMMAND: "fsx" #0 [eed63cbc] schedule at c083c5b3 #1 [eed63d80] kmap_high at c0500ec8 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs] #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs] #4 [eed63e50] do_writepages at c04f3e32 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a #6 [eed63ea4] filemap_fdatawrite at c04e1b3e #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs] #8 [eed63ecc] do_sync_write at c052d202 #9 [eed63f74] vfs_write at c052d4ee #10 [eed63f94] sys_write at c052df4c #11 [eed63fb0] ia32_sysenter_target at c0409a98 EAX: 00000004 EBX: 00000003 ECX: abd73b73 EDX: 012a65c6 DS: 007b ESI: 012a65c6 ES: 007b EDI: 00000000 SS: 007b ESP: bf8db178 EBP: bf8db1f8 GS: 0033 CS: 0073 EIP: 40000424 ERR: 00000004 EFLAGS: 00000246 Each task would kmap part of its address array before getting stuck, but not enough to actually issue the write. This patch fixes this by serializing the marshal_iov operations for async reads and writes. The idea here is to ensure that cifs aggressively tries to populate a request before attempting to fulfill another one. As soon as all of the pages are kmapped for a request, then we can unlock and allow another one to proceed. There's no need to do this serialization on non-CONFIG_HIGHMEM arches however, so optimize all of this out when CONFIG_HIGHMEM isn't set. Reported-by: Jian Li <jiali@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02mm: compaction: introduce sync-light migration for use by compactionMel Gorman
commit a6bc32b899223a877f595ef9ddc1e89ead5072b8 upstream. Stable note: Not tracked in Buzilla. This was part of a series that reduced interactivity stalls experienced when THP was enabled. These stalls were particularly noticable when copying data to a USB stick but the experiences for users varied a lot. This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT mode that avoids writing back pages to backing storage. Async compaction maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is used. This avoids sync compaction stalling for an excessive length of time, particularly when copying files to a USB stick where there might be a large number of dirty pages backed by a filesystem that does not support ->writepages. [aarcange@redhat.com: This patch is heavily based on Andrea's work] [akpm@linux-foundation.org: fix fs/nfs/write.c build] [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andy Isaacson <adi@hexapodia.org> Cc: Nai Xia <nai.xia@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02mm: compaction: determine if dirty pages can be migrated without blocking ↵Mel Gorman
within ->migratepage commit b969c4ab9f182a6e1b2a0848be349f99714947b0 upstream. Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging information by reducing LRU list churning had the side-effect of reducing THP allocation success rates. This was part of a series to restore the success rates while preserving the reclaim fix. Asynchronous compaction is used when allocating transparent hugepages to avoid blocking for long periods of time. Due to reports of stalling, there was a debate on disabling synchronous compaction but this severely impacted allocation success rates. Part of the reason was that many dirty pages are skipped in asynchronous compaction by the following check; if (PageDirty(page) && !sync && mapping->a_ops->migratepage != migrate_page) rc = -EBUSY; This skips over all mapping aops using buffer_migrate_page() even though it is possible to migrate some of these pages without blocking. This patch updates the ->migratepage callback with a "sync" parameter. It is the responsibility of the callback to fail gracefully if migration would block. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andy Isaacson <adi@hexapodia.org> Cc: Nai Xia <nai.xia@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-25eCryptfs: Properly check for O_RDONLY flag before doing privileged openTyler Hicks
commit 9fe79d7600497ed8a95c3981cbe5b73ab98222f0 upstream. If the first attempt at opening the lower file read/write fails, eCryptfs will retry using a privileged kthread. However, the privileged retry should not happen if the lower file's inode is read-only because a read/write open will still be unsuccessful. The check for determining if the open should be retried was intended to be based on the access mode of the lower file's open flags being O_RDONLY, but the check was incorrectly performed. This would cause the open to be retried by the privileged kthread, resulting in a second failed open of the lower file. This patch corrects the check to determine if the open request should be handled by the privileged kthread. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-25eCryptfs: Fix lockdep warning in miscdev operationsTyler Hicks
commit 60d65f1f07a7d81d3eb3b91fc13fca80f2fdbb12 upstream. Don't grab the daemon mutex while holding the message context mutex. Addresses this lockdep warning: ecryptfsd/2141 is trying to acquire lock: (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs] but task is already holding lock: (&(*daemon)->mux){+.+...}, at: [<ffffffffa029c2ec>] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(*daemon)->mux){+.+...}: [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220 [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0 [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50 [<ffffffffa029c5d7>] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs] [<ffffffffa029b744>] ecryptfs_send_message+0x134/0x1e0 [ecryptfs] [<ffffffffa029a24e>] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs] [<ffffffffa02960f8>] ecryptfs_write_metadata+0x108/0x250 [ecryptfs] [<ffffffffa0290f80>] ecryptfs_create+0x130/0x250 [ecryptfs] [<ffffffff811963a4>] vfs_create+0xb4/0x120 [<ffffffff81197865>] do_last+0x8c5/0xa10 [<ffffffff811998f9>] path_openat+0xd9/0x460 [<ffffffff81199da2>] do_filp_open+0x42/0xa0 [<ffffffff81187998>] do_sys_open+0xf8/0x1d0 [<ffffffff81187a91>] sys_open+0x21/0x30 [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}: [<ffffffff810a3418>] __lock_acquire+0x1bf8/0x1c50 [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220 [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0 [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50 [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs] [<ffffffff811887d3>] vfs_read+0xb3/0x180 [<ffffffff811888ed>] sys_read+0x4d/0x90 [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-25eCryptfs: Gracefully refuse miscdev file ops on inherited/passed filesTyler Hicks
commit 8dc6780587c99286c0d3de747a2946a76989414a upstream. File operations on /dev/ecryptfs would BUG() when the operations were performed by processes other than the process that originally opened the file. This could happen with open files inherited after fork() or file descriptors passed through IPC mechanisms. Rather than calling BUG(), an error code can be safely returned in most situations. In ecryptfs_miscdev_release(), eCryptfs still needs to handle the release even if the last file reference is being held by a process that didn't originally open the file. ecryptfs_find_daemon_by_euid() will not be successful, so a pointer to the daemon is stored in the file's private_data. The private_data pointer is initialized when the miscdev file is opened and only used when the file is released. https://launchpad.net/bugs/994247 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Sasha Levin <levinsasha928@gmail.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-25pnfs-obj: Fix __r4w_get_page when offset is beyond i_sizeBoaz Harrosh
commit c999ff68029ebd0f56ccae75444f640f6d5a27d2 upstream. It is very common for the end of the file to be unaligned on stripe size. But since we know it's beyond file's end then the XOR should be preformed with all zeros. Old code used to just read zeros out of the OSD devices, which is a great waist. But what scares me more about this situation is that, we now have pages attached to the file's mapping that are beyond i_size. I don't like the kind of bugs this calls for. Fix both birds, by returning a global zero_page, if offset is beyond i_size. TODO: Change the API to ->__r4w_get_page() so a NULL can be returned without being considered as error, since XOR API treats NULL entries as zero_pages. [Bug since 3.2. Should apply the same way to all Kernels since] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [bwh: Backported to 3.2: adjust for lack of wdata->header] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-25pnfs-obj: don't leak objio_state if ore_write/read failsBoaz Harrosh
commit 9909d45a8557455ca5f8ee7af0f253debc851f1a upstream. [Bug since 3.2 Kernel] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>