<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/mm/fadvise.c, branch v3.2.26</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM</title>
<updated>2010-03-06T19:26:25+00:00</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2010-03-05T21:42:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0141450f66c3c12a3aaa869748caa64241885cdf'/>
<id>0141450f66c3c12a3aaa869748caa64241885cdf</id>
<content type='text'>
This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
a 16K read will be carried out in 4 _sync_ 1-page reads.

In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.

POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.

So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

Note that the random hint is not likely to help random reads performance
noticeably.  And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).

In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!

Tested-by: Quentin Barnes &lt;qbarnes+nfs@yahoo-inc.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Cc: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Cc: &lt;stable@kernel.org&gt;			[2.6.33.x]
Cc: &lt;qbarnes+nfs@yahoo-inc.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
a 16K read will be carried out in 4 _sync_ 1-page reads.

In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.

POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.

So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

Note that the random hint is not likely to help random reads performance
noticeably.  And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).

In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!

Tested-by: Quentin Barnes &lt;qbarnes+nfs@yahoo-inc.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Cc: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Cc: &lt;stable@kernel.org&gt;			[2.6.33.x]
Cc: &lt;qbarnes+nfs@yahoo-inc.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>readahead: move max_sane_readahead() calls into force_page_cache_readahead()</title>
<updated>2009-06-17T02:47:28+00:00</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2009-06-16T22:31:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f7e839dd36fd940b0202cfb7d39b2a1b2dc59b1b'/>
<id>f7e839dd36fd940b0202cfb7d39b2a1b2dc59b1b</id>
<content type='text'>
Impact: code simplification.

Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Impact: code simplification.

Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[CVE-2009-0029] System call wrapper special cases</title>
<updated>2009-01-14T13:15:18+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2009-01-14T13:14:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6673e0c3fbeaed2cd08e2fd4a4aa97382d6fedb0'/>
<id>6673e0c3fbeaed2cd08e2fd4a4aa97382d6fedb0</id>
<content type='text'>
System calls with an unsigned long long argument can't be converted with
the standard wrappers since that would include a cast to long, which in
turn means that we would lose the upper 32 bit on 32 bit architectures.
Also semctl can't use the standard wrapper since it has a 'union'
parameter.

So we handle them as special case and add some extra wrappers instead.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
System calls with an unsigned long long argument can't be converted with
the standard wrappers since that would include a cast to long, which in
turn means that we would lose the upper 32 bit on 32 bit architectures.
Also semctl can't use the standard wrapper since it has a 'union'
parameter.

So we handle them as special case and add some extra wrappers instead.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove Andrew Morton's old email accounts</title>
<updated>2008-10-16T18:21:32+00:00</updated>
<author>
<name>Francois Cami</name>
<email>francois.cami@free.fr</email>
</author>
<published>2008-10-16T05:01:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e1f8e87449147ffe5ea3de64a46af7de450ce279'/>
<id>e1f8e87449147ffe5ea3de64a46af7de450ce279</id>
<content type='text'>
People can use the real name an an index into MAINTAINERS to find the
current email address.

Signed-off-by: Francois Cami &lt;francois.cami@free.fr&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
People can use the real name an an index into MAINTAINERS to find the
current email address.

Signed-off-by: Francois Cami &lt;francois.cami@free.fr&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xip: support non-struct page backed memory</title>
<updated>2008-04-28T15:58:23+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-04-28T09:13:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=70688e4dd1647f0ceb502bbd5964fa344c5eb411'/>
<id>70688e4dd1647f0ceb502bbd5964fa344c5eb411</id>
<content type='text'>
Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for
the user mappings.

This requires the get_xip_page API to be changed to an address based one.
Improve the API layering a little bit too, while we're here.

This is required in order to support XIP filesystems on memory that isn't
backed with struct page (but memory with struct page is still supported too).

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Acked-by: Carsten Otte &lt;cotte@de.ibm.com&gt;
Cc: Jared Hulbert &lt;jaredeh@gmail.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for
the user mappings.

This requires the get_xip_page API to be changed to an address based one.
Improve the API layering a little bit too, while we're here.

This is required in order to support XIP filesystems on memory that isn't
backed with struct page (but memory with struct page is still supported too).

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Acked-by: Carsten Otte &lt;cotte@de.ibm.com&gt;
Cc: Jared Hulbert &lt;jaredeh@gmail.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>check ADVICE of fadvise64_64 even if get_xip_page is given</title>
<updated>2008-02-05T17:44:19+00:00</updated>
<author>
<name>Masatake YAMATO</name>
<email>yamato@redhat.com</email>
</author>
<published>2008-02-05T06:29:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b5beb1caff4ce063e6e2dc13f23b80eeef4e9782'/>
<id>b5beb1caff4ce063e6e2dc13f23b80eeef4e9782</id>
<content type='text'>
I've written some test programs in ltp project.  During writing I met an
problem which I cannot solve in user land.  So I wrote a patch for linux
kernel.  Please, include this patch if acceptable.

The test program tests the 4th parameter of fadvise64_64:

    long sys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice);

My test case calls fadvise64_64 with invalid advice value and checks errno is
set to EINVAL.  About the advice parameter man page says:

    ...
    Permissible values for advice include:

	   POSIX_FADV_NORMAL
                  ...
	   POSIX_FADV_SEQUENTIAL
                  ...
	   POSIX_FADV_RANDOM
		  ...
	   POSIX_FADV_NOREUSE
                  ...
	   POSIX_FADV_WILLNEED
                  ...
	   POSIX_FADV_DONTNEED
		  ...
    ERRORS
           ...
	   EINVAL An invalid value was specified for advice.

However, I got a bug report that the system call invocations
in my test case returned 0 unexpectedly.

I've inspected the kernel code:

    asmlinkage long sys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
    {
	    struct file *file = fget(fd);
	    struct address_space *mapping;
	    struct backing_dev_info *bdi;
	    loff_t endbyte;			/* inclusive */
	    pgoff_t start_index;
	    pgoff_t end_index;
	    unsigned long nrpages;
	    int ret = 0;

	    if (!file)
		    return -EBADF;

	    if (S_ISFIFO(file-&gt;f_path.dentry-&gt;d_inode-&gt;i_mode)) {
		    ret = -ESPIPE;
		    goto out;
	    }

	    mapping = file-&gt;f_mapping;
	    if (!mapping || len &lt; 0) {
		    ret = -EINVAL;
		    goto out;
	    }

	    if (mapping-&gt;a_ops-&gt;get_xip_page)
		    /* no bad return value, but ignore advice */
		    goto out;
    ...
    out:
	    fput(file);
	    return ret;
    }

I found the advice parameter is just ignored in the case
mapping-&gt;a_ops-&gt;get_xip_page is given. This behavior is different from
what is written on the man page. Is this o.k.?

get_xip_page is given if CONFIG_EXT2_FS_XIP is true.
Anyway I cannot find the easy way to detect get_xip_page
field is given or CONFIG_EXT2_FS_XIP is true from the
user space.

I propose the following patch which checks the advice parameter
even if get_xip_page is given.

Signed-off-by: Masatake YAMATO &lt;yamato@redhat.com&gt;
Acked-by: Carsten Otte &lt;cotte@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I've written some test programs in ltp project.  During writing I met an
problem which I cannot solve in user land.  So I wrote a patch for linux
kernel.  Please, include this patch if acceptable.

The test program tests the 4th parameter of fadvise64_64:

    long sys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice);

My test case calls fadvise64_64 with invalid advice value and checks errno is
set to EINVAL.  About the advice parameter man page says:

    ...
    Permissible values for advice include:

	   POSIX_FADV_NORMAL
                  ...
	   POSIX_FADV_SEQUENTIAL
                  ...
	   POSIX_FADV_RANDOM
		  ...
	   POSIX_FADV_NOREUSE
                  ...
	   POSIX_FADV_WILLNEED
                  ...
	   POSIX_FADV_DONTNEED
		  ...
    ERRORS
           ...
	   EINVAL An invalid value was specified for advice.

However, I got a bug report that the system call invocations
in my test case returned 0 unexpectedly.

I've inspected the kernel code:

    asmlinkage long sys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
    {
	    struct file *file = fget(fd);
	    struct address_space *mapping;
	    struct backing_dev_info *bdi;
	    loff_t endbyte;			/* inclusive */
	    pgoff_t start_index;
	    pgoff_t end_index;
	    unsigned long nrpages;
	    int ret = 0;

	    if (!file)
		    return -EBADF;

	    if (S_ISFIFO(file-&gt;f_path.dentry-&gt;d_inode-&gt;i_mode)) {
		    ret = -ESPIPE;
		    goto out;
	    }

	    mapping = file-&gt;f_mapping;
	    if (!mapping || len &lt; 0) {
		    ret = -EINVAL;
		    goto out;
	    }

	    if (mapping-&gt;a_ops-&gt;get_xip_page)
		    /* no bad return value, but ignore advice */
		    goto out;
    ...
    out:
	    fput(file);
	    return ret;
    }

I found the advice parameter is just ignored in the case
mapping-&gt;a_ops-&gt;get_xip_page is given. This behavior is different from
what is written on the man page. Is this o.k.?

get_xip_page is given if CONFIG_EXT2_FS_XIP is true.
Anyway I cannot find the easy way to detect get_xip_page
field is given or CONFIG_EXT2_FS_XIP is true from the
user space.

I propose the following patch which checks the advice parameter
even if get_xip_page is given.

Signed-off-by: Masatake YAMATO &lt;yamato@redhat.com&gt;
Acked-by: Carsten Otte &lt;cotte@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path</title>
<updated>2006-12-08T16:28:43+00:00</updated>
<author>
<name>Josef "Jeff" Sipek</name>
<email>jsipek@cs.sunysb.edu</email>
</author>
<published>2006-12-08T10:36:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d3ac7f892b7d07d61d0895caa4f6e190e43112f8'/>
<id>d3ac7f892b7d07d61d0895caa4f6e190e43112f8</id>
<content type='text'>
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in linux/mm/.

Signed-off-by: Josef "Jeff" Sipek &lt;jsipek@cs.sunysb.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in linux/mm/.

Signed-off-by: Josef "Jeff" Sipek &lt;jsipek@cs.sunysb.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] fadvise() make POSIX_FADV_NOREUSE a no-op</title>
<updated>2006-08-06T15:57:47+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@osdl.org</email>
</author>
<published>2006-08-05T19:14:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=60c371bc753495f36d3a71338b46030f7fffce3b'/>
<id>60c371bc753495f36d3a71338b46030f7fffce3b</id>
<content type='text'>
The POSIX_FADV_NOREUSE hint means "the application will use this range of the
file a single time".  It seems to be intended that the implementation will use
this hint to perform drop-behind of that part of the file when the application
gets around to reading or writing it.

However for reasons which aren't obvious (or sane?) I mapped
POSIX_FADV_NOREUSE onto POSIX_FADV_WILLNEED.  ie: it does readahead.

That's daft.  So for now, make POSIX_FADV_NOREUSE a no-op.

This is a non-back-compatible change.  If someone was using POSIX_FADV_NOREUSE
to perform readahead, they lose.  The likelihood is low.

If/when we later implement POSIX_FADV_NOREUSE things will get interesting - to
do it fully we'll need to maintain file offset/length ranges and peform all
sorts of complex tricks, and managing the lifetime of those ranges' data
structures will be interesting..

A sensible implementation would probably ignore the file range and would
simply mark the entire file as needing some form of drop-behind treatment.

Cc: Michael Kerrisk &lt;mtk-manpages@gmx.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The POSIX_FADV_NOREUSE hint means "the application will use this range of the
file a single time".  It seems to be intended that the implementation will use
this hint to perform drop-behind of that part of the file when the application
gets around to reading or writing it.

However for reasons which aren't obvious (or sane?) I mapped
POSIX_FADV_NOREUSE onto POSIX_FADV_WILLNEED.  ie: it does readahead.

That's daft.  So for now, make POSIX_FADV_NOREUSE a no-op.

This is a non-back-compatible change.  If someone was using POSIX_FADV_NOREUSE
to perform readahead, they lose.  The likelihood is low.

If/when we later implement POSIX_FADV_NOREUSE things will get interesting - to
do it fully we'll need to maintain file offset/length ranges and peform all
sorts of complex tricks, and managing the lifetime of those ranges' data
structures will be interesting..

A sensible implementation would probably ignore the file range and would
simply mark the entire file as needing some form of drop-behind treatment.

Cc: Michael Kerrisk &lt;mtk-manpages@gmx.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] fadvise: remove dead comments</title>
<updated>2006-07-10T20:24:14+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@osdl.org</email>
</author>
<published>2006-07-10T11:44:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8f72e4028a1ff968000cec4a034f45619fbd7ec4'/>
<id>8f72e4028a1ff968000cec4a034f45619fbd7ec4</id>
<content type='text'>
Cc: "Michael Kerrisk" &lt;mtk-manpages@gmx.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cc: "Michael Kerrisk" &lt;mtk-manpages@gmx.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] sys_sync_file_range()</title>
<updated>2006-03-31T20:18:54+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@osdl.org</email>
</author>
<published>2006-03-31T10:30:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f79e2abb9bd452d97295f34376dedbec9686b986'/>
<id>f79e2abb9bd452d97295f34376dedbec9686b986</id>
<content type='text'>
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:

- It's more flexible.  Things which would require two or three syscalls with
  fadvise() can be done in a single syscall.

- Using fadvise() in this manner is something not covered by POSIX.

The patch wires up the syscall for x86.

The sycall is implemented in the new fs/sync.c.  The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.

Documentation for the syscall is in fs/sync.c.

A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.

The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC.  I can skip the -&gt;fsync call for
NFS_DATA_SYNC which is hopefully the more common."

Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested.  This is trivial to fix: add a new flag bit, set
wbc-&gt;nonblocking.  But I'm not sure that we want to expose implementation
details down to that level.

Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).

Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines.  It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility.  Perhaps it should make such
requests fail...

Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Michael Kerrisk &lt;mtk-manpages@gmx.net&gt;
Cc: Ulrich Drepper &lt;drepper@redhat.com&gt;
Cc: Neil Brown &lt;neilb@cse.unsw.edu.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:

- It's more flexible.  Things which would require two or three syscalls with
  fadvise() can be done in a single syscall.

- Using fadvise() in this manner is something not covered by POSIX.

The patch wires up the syscall for x86.

The sycall is implemented in the new fs/sync.c.  The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.

Documentation for the syscall is in fs/sync.c.

A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.

The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC.  I can skip the -&gt;fsync call for
NFS_DATA_SYNC which is hopefully the more common."

Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested.  This is trivial to fix: add a new flag bit, set
wbc-&gt;nonblocking.  But I'm not sure that we want to expose implementation
details down to that level.

Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).

Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines.  It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility.  Perhaps it should make such
requests fail...

Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Michael Kerrisk &lt;mtk-manpages@gmx.net&gt;
Cc: Ulrich Drepper &lt;drepper@redhat.com&gt;
Cc: Neil Brown &lt;neilb@cse.unsw.edu.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
