summaryrefslogtreecommitdiff
path: root/fs/nfs/write.c
AgeCommit message (Collapse)Author
2012-01-06NFS: Remove pNFS bloat from the generic write pathTrond Myklebust
We have no business doing any this in the standard write release path. Get rid of it, and put it in the pNFS layer. Also, while we're at it, get rid of the completely bogus unlock/relock semantics that were present in nfs_writeback_release_full(). It is not only unnecessary, but actually dangerous to release the write lock just in order to take it again in nfs_page_async_flush(). Better just to open code the pgio operations in a pnfs helper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-06Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
2011-11-02nfs: Remove unused variable from write.cRakib Mullick
When CONFIG_NFS=y and CONFIG_NFS_V3_{,V4}=n we get the following warning. fs/nfs/write.c: In function ‘nfs_writeback_done’: fs/nfs/write.c:1246:21: warning: unused variable ‘server’ Remove the variable 'server' to fix the above warning. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-31fs: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macrosPaul Gortmaker
These files were getting <linux/module.h> via an implicit include path, but we want to crush those out of existence since they cost time during compiles of processing thousands of lines of headers for no reason. Give them the lightweight header that just contains the EXPORT_SYMBOL infrastructure. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-19NFS: Get rid of unnecessary calls to ClearPageError() in read codeTrond Myklebust
The generic file read code does that for us anyway. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19NFS: Get rid of nfs_restart_rpc()Trond Myklebust
It can trivially be replaced with rpc_restart_call_prepare. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18NFS: Use the inode->i_version to cache NFSv4 change attribute informationTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfs: recoalesce when ld write pagelist failsPeng Tao
For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to mds. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18nfs: don't try to migrate pages with active requestsJeff Layton
nfs_find_and_lock_request will take a reference to the nfs_page and will then put it if the req is already locked. It's possible though that the reference will be the last one. That put then can kick off a whole series of reference puts: nfs_page nfs_open_context dentry inode If the inode ends up being deleted, then the VFS will call truncate_inode_pages. That function will try to take the page lock, but it was already locked when migrate_page was called. The code deadlocks. Fix this by simply refusing the migration request if PagePrivate is already set, indicating that the page is already associated with an active read or write request. We've had a customer test a backported version of this patch and the preliminary results seem good. Cc: stable@kernel.org Cc: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18nfs: don't redirty inode when ncommit == 0 in nfs_commit_unstable_pagesJeff Layton
commit 420e3646 allowed the kernel to reduce the number of unnecessary commit calls by skipping the commit when there are a large number of outstanding pages. However, the current test in nfs_commit_unstable_pages does not handle the edge condition properly. When ncommit == 0, then that means that the kernel doesn't need to do anything more for the inode. The current test though in the WB_SYNC_NONE case will return true, and the inode will end up being marked dirty. Once that happens the inode will never be clean until there's a WB_SYNC_ALL flush. Fix this by immediately returning from nfs_commit_unstable_pages when ncommit == 0. Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which has a backported version of 420e3646. The inode cache there was growing very large. The inode cache was unable to be shrunk since the inodes were all marked dirty. Calling sync() would essentially "fix" the problem -- the WB_SYNC_ALL flush would result in the inodes all being marked clean. What I'm not clear on is how big a problem this is in mainline kernels as the writeback code there is very different. Either way, it seems incorrect to re-mark the inode dirty in this case. Reported-by: Mike McLean <mikem@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: stable@kernel.org [2.6.34+] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18Revert "NFS: Ensure that writeback_single_inode() calls write_inode() when ↵Trond Myklebust
syncing" This reverts commit b80c3cb628f0ebc241b02e38dd028969fb8026a2. The reverted commit was rendered obsolete by a VFS fix: commit 5547e8aac6f71505d621a612de2fca0dd988b439 (writeback: Update dirty flags in two steps). We now no longer need to worry about writeback_single_inode() missing our marking the inode for COMMIT in 'do_writepages()' call. Reverting this patch, fixes a performance regression in which the inode would continuously get queued to the dirty list, causing the writeback code to unnecessarily try to send a COMMIT. Signed-off-by: Trond Myklebust <Trond.Myklebust> Tested-by: Simon Kirby <sim@hostway.ca> Cc: stable@kernel.org [2.6.35+]
2011-09-13NFS: Fix a typo in nfs_flush_multiTrond Myklebust
Fix a typo which causes an Oops in the RPC layer, when using wsize < 4k. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Sricharan R <r.sricharan@ti.com>
2011-07-27Merge branch 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
* 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits) NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation() nfs: don't use d_move in nfs_async_rename_done RDMA: Increasing RPCRDMA_MAX_DATA_SEGS SUNRPC: Replace xprt->resend and xprt->sending with a priority queue SUNRPC: Allow caller of rpc_sleep_on() to select priority levels SUNRPC: Support dynamic slot allocation for TCP connections SUNRPC: Clean up the slot table allocation SUNRPC: Initalise the struct xprt upon allocation SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot pnfs: simplify pnfs files module autoloading nfs: document nfsv4 sillyrename issues NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL SUNRPC: Convert the backchannel exports to EXPORT_SYMBOL_GPL SUNRPC: sunrpc should not explicitly depend on NFS config options NFS: Clean up - simplify the switch to read/write-through-MDS NFS: Move the pnfs write code into pnfs.c NFS: Move the pnfs read code into pnfs.c NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request NFS: Cache rpc_ops in struct nfs_pageio_descriptor ...
2011-07-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: (27 commits) mm: properly reflect task dirty limits in dirty_exceeded logic writeback: don't busy retry writeback on new/freeing inodes writeback: scale IO chunk size up to half device bandwidth writeback: trace global_dirty_state writeback: introduce max-pause and pass-good dirty limits writeback: introduce smoothed global dirty limit writeback: consolidate variable names in balance_dirty_pages() writeback: show bdi write bandwidth in debugfs writeback: bdi write bandwidth estimation writeback: account per-bdi accumulated written pages writeback: make writeback_control.nr_to_write straight writeback: skip tmpfs early in balance_dirty_pages_ratelimited_nr() writeback: trace event writeback_queue_io writeback: trace event writeback_single_inode writeback: remove .nonblocking and .encountered_congestion writeback: remove writeback_control.more_io writeback: skip balance_dirty_pages() for in-memory fs writeback: add bdi_dirty_limit() kernel-doc writeback: avoid extra sync work at enqueue time writeback: elevate queue_io() into wb_writeback() ... Fix up trivial conflicts in fs/fs-writeback.c and mm/filemap.c
2011-07-25Merge branch 'master' into devel and apply fixup from Stephen Rothwell:Stephen Rothwell
vfs/nfs: fixup for nfs_open_context change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-20nfs_open_context doesn't need struct path eitherAl Viro
just dentry, please... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-15NFS: Clean up - simplify the switch to read/write-through-MDSTrond Myklebust
Use nfs_pageio_reset_read_mds and nfs_pageio_reset_write_mds instead of completely reinitialising the struct nfs_pageio_descriptor. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15NFS: Move the pnfs write code into pnfs.cTrond Myklebust
...and ensure that we recoalese to take into account differences in differences in block sizes when falling back to write through the MDS. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write requestTrond Myklebust
Instead of looking up the rsize and wsize, the routines that generate the RPC requests should really be using the pg_bsize, since that is what we use when deciding whether or not to coalesce write requests... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15NFS: Cache rpc_ops in struct nfs_pageio_descriptorTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15NFS: Clean up: split out the RPC transmission from nfs_pagein_multi/oneTrond Myklebust
...and do the same for nfs_flush_multi/one. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15NFS: fix return value of nfs_pagein_one/nfs_flush_onePeng Tao
Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFS: Clean up nfs_read_rpcsetup and nfs_write_rpcsetupTrond Myklebust
Split them up into two parts: one which sets up the struct nfs_read/write_data, the other which sets up the actual RPC call or pNFS call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFS: Don't use DATA_SYNC writesTrond Myklebust
If we're writing back data, and the FLUSH_STABLE flag is set, then we always want to use NFS_FILE_SYNC, since we're always in a situation where we're doing page reclaim, and so we want to free up the page as quickly as possible. If we're in the FLUSH_COND_STABLE case, then we either want to use another unstable write (if we have to do a commit anyway) or again, we want to use NFS_FILE_SYNC because we know that we have no more pages to write out. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFSv4.1: File layout only supports whole file layoutsAndy Adamson
Ask for whole file layouts. Until support for layout segments is fully supported in the file layout code, discard non-whole file layouts. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFSv4.1: Fall back to ordinary i/o through the mds if we have no layout segmentTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFSv4.1: Add an initialisation callback for pNFSTrond Myklebust
Ensure that we always get a layout before setting up the i/o request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFS: Cleanup of the nfs_pageio code in preparation for a pnfs bugfixTrond Myklebust
We need to ensure that the layouts are set up before we can decide to coalesce requests. To do so, we want to further split up the struct nfs_pageio_descriptor operations into an initialisation callback, a coalescing test callback, and a 'do i/o' callback. This patch cleans up the existing callback methods before adding the 'initialisation' callback. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-28pnfs: write: Set mds_offset in the generic layer - it is needed by all LDsBoaz Harrosh
In current pnfs tree, all the layouts set mds_offset in their .write_pagelist member. mds_offset is only used by generic layer and should be handled by it. This patch is for upstream. It is needed in this -rc series to fix a bug in objects layout_commit. I'll send patches for objects and blocks to be squashed into current pnfs tree. TODO: It looks like the read path needs the same patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-08writeback: remove .nonblocking and .encountered_congestionWu Fengguang
Remove two unused struct writeback_control fields: .encountered_congestion (completely unused) .nonblocking (never set, checked/showed in XFS,NFS/btrfs) The .for_background check in nfs_write_inode() is also removed btw, as .for_background implies WB_SYNC_NONE. Reviewed-by: Jan Kara <jack@suse.cz> Proposed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-05-29NFSv4.1: unify pnfs_pageio_init functionsBenny Halevy
Use common code for pnfs_pageio_init_{read,write} and use a common generic pg_test function. Note that this function always assumes the the layout driver's pg_test method is implemented. [Fix BUG] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: Use byte-range for layoutgetBenny Halevy
Add offset and count parameters to pnfs_update_layout and use them to get the layout in the pageio path. Order cache layout segments in the following order: * offset (ascending) * length (descending) * iomode (RW before READ) Test byte range against the layout segment in use in pnfs_{read,write}_pg_test so not to coalesce pages not using the same layout segment. [fix lseg ordering] [clean up pnfs_find_lseg lseg arg] [remove unnecessary FIXME] [fix ordering in pnfs_insert_layout] [clean up pnfs_insert_layout] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-11NFSv4.1: Ensure that layoutget uses the correct gfp modesTrond Myklebust
Currently, writebacks may end up recursing back into the filesystem due to GFP_KERNEL direct reclaims in the pnfs subsystem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-12NFS: Get rid of pointless test in nfs_commit_doneTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-12NFS: Eliminate duplicate call to nfs_mark_request_dirtyTrond Myklebust
We only need to call nfs_mark_request_dirty() once in nfs_writepage_setup(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-12nfs: don't call __mark_inode_dirty while holding i_lockDave Chinner
nfs_scan_commit() is called with the inode->i_lock held, but it then calls __mark_inode_dirty() while still holding the lock. This causes a deadlock. Push the inode->i_lock into nfs_scan_commit() so it can protect only the parts of the code it needs to and can be dropped before the call to __mark_inode_dirty() to avoid the deadlock. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Will Simoneau <simoneau@ele.uri.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-27NFS: Fix a hang in the writeback pathTrond Myklebust
Now that the inode scalability patches have been merged, it is no longer safe to call igrab() under the inode->i_lock. Now that we no longer call nfs_clear_request() until the nfs_page is being freed, we know that we are always holding a reference to the nfs_open_context, which again holds a reference to the path, and so the inode cannot be freed until the last nfs_page has been removed from the radix tree and freed. We can therefore skip the igrab()/iput() altogether. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24NFSv4.1 convert layoutcommit sync to booleanAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23NFSv4.1: layoutcommitAndy Adamson
The filelayout driver sends LAYOUTCOMMIT only when COMMIT goes to the data server (as opposed to the MDS) and the data server WRITE is not NFS_FILE_SYNC. Only whole file layout support means that there is only one IOMODE_RW layout segment. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> Tested-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23NFSv4.1: filelayout driver specific code for COMMITFred Isaman
Implement all the hooks created in the previous patches. This requires exporting quite a few functions and adding a few structure fields. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23NFSv4.1: remove GETATTR from ds commitsFred Isaman
Any COMMIT compound directed to a data server needs to have the GETATTR calls suppressed. We here, make sure the field we are testing (data->lseg) is set and refcounted correctly. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23NFSv4.1: add generic layer hooks for pnfs COMMITFred Isaman
We create three major hooks for the pnfs code. pnfs_mark_request_commit() is called during writeback_done from nfs_mark_request_commit, which gives the driver an opportunity to claim it wants control over commiting a particular req. pnfs_choose_commit_list() is called from nfs_scan_list to choose which list a given req should be added to, based on where we intend to send it for COMMIT. It is up to the driver to have preallocated list headers for each destination it may need. pnfs_commit_list() is how the driver actually takes control, it is used instead of nfs_commit_list(). In order to pass information between the above functions, we create a union in nfs_page to hold a lseg (which is possible because the req is not on any list while in transition), and add some flags to indicate if we need to use the pnfs code. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23NFSv4.1: pull out code from nfs_commit_releaseFred Isaman
Create a separate support function for later use by data server commit code. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23NFSv4.1: pull error handling out of nfs_commit_listFred Isaman
Create a separate support function for later use by data server commit code. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23NFSv4.1: rearrange nfs_commit_rpcsetupFred Isaman
Reorder nfs_commit_rpcsetup, preparing for a pnfs entry point. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23NFSv4.1: don't send COMMIT to ds for data sync writesFred Isaman
Based on consensus reached in Feb 2011 interim IETF meeting regarding use of LAYOUTCOMMIT, it has been decided that a NFS_DATA_SYNC return from a WRITE to data server should not initiate a COMMIT. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-21NFS: Fix a hang/infinite loop in nfs_wb_page()Trond Myklebust
When one of the two waits in nfs_commit_inode() is interrupted, it returns a non-negative value, which causes nfs_wb_page() to think that the operation was successful causing it to busy-loop rather than exiting. It also causes nfs_file_fsync() to incorrectly report the file as being successfully committed to disk. This patch fixes both problems by ensuring that we return an error if the attempts to wait fail. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2011-03-21FS: Use stable writes when not doing a bulk flushTrond Myklebust
If we're only doing a single write, and there are no other unstable writes being queued up, we might want to just flip to using a stable write RPC call. Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11NFSv4.1: Clear lseg pointer in ->doio functionFred Isaman
Now that we have access to the pointer, clear it immediately after the put, instead of in caller. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11NFSv4.1: rearrange ->doio argsFred Isaman
This will make it possible to clear the lseg pointer in the same function as it is put, instead of in the caller nfs_pageio_doio(). Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>