<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/fs.h, branch v2.6.16.15</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>[PATCH] Simplify proc/devices and fix early termination regression</title>
<updated>2006-05-01T19:03:43+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@osdl.org</email>
</author>
<published>2006-04-21T08:51:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=692c0509fd0719406f8f781d9a9f2e19aa6b7c0a'/>
<id>692c0509fd0719406f8f781d9a9f2e19aa6b7c0a</id>
<content type='text'>
Repair /proc/devices early-termination regression.

2.6.16 broke /proc/devices.  An application often gets an
EOF before the end of data is reached, if that application
uses a series of short read(2)s to access the data.  I have
used read buffers of varying sizes with varying degrees
of unsuccess (larger sizes get further into the data than
smaller sizes, following a simple pattern).  It appears
that the only safe way to get the data is to use a single
read buffer larger than all the data in /proc/devices.

The following example demonstates the problem:

    # dd if=/proc/devices bs=1
    Character devices:
      1 mem
    27+0 records in
    27+0 records out

This patch is a backport of the fix recently accepted to
Linus's tree:

    commit 68eef3b4791572ecb70249c7fb145bb3742dd899
    [PATCH] Simplify proc/devices and fix early termination regression

It replaces the complex, state-machine algorithm introduced
in 2.6.16 with a simple algorithm, modeled on the implementation
of /proc/interrupts.

[akpm@osdl.org: cleanups, simplifications]

Signed-off-by: Joe Korty &lt;joe.korty@ccur.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Repair /proc/devices early-termination regression.

2.6.16 broke /proc/devices.  An application often gets an
EOF before the end of data is reached, if that application
uses a series of short read(2)s to access the data.  I have
used read buffers of varying sizes with varying degrees
of unsuccess (larger sizes get further into the data than
smaller sizes, following a simple pattern).  It appears
that the only safe way to get the data is to use a single
read buffer larger than all the data in /proc/devices.

The following example demonstates the problem:

    # dd if=/proc/devices bs=1
    Character devices:
      1 mem
    27+0 records in
    27+0 records out

This patch is a backport of the fix recently accepted to
Linus's tree:

    commit 68eef3b4791572ecb70249c7fb145bb3742dd899
    [PATCH] Simplify proc/devices and fix early termination regression

It replaces the complex, state-machine algorithm introduced
in 2.6.16 with a simple algorithm, modeled on the implementation
of /proc/interrupts.

[akpm@osdl.org: cleanups, simplifications]

Signed-off-by: Joe Korty &lt;joe.korty@ccur.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] ext3: ext3_symlink should use GFP_NOFS allocations inside</title>
<updated>2006-03-11T17:19:34+00:00</updated>
<author>
<name>Kirill Korotaev</name>
<email>dev@openvz.org</email>
</author>
<published>2006-03-11T11:27:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0adb25d2e71ab047423d6fc63d5d184590d0a66f'/>
<id>0adb25d2e71ab047423d6fc63d5d184590d0a66f</id>
<content type='text'>
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in
ext3_symlink().  Such allocation may re-enter ext3 code from
try_to_free_pages.  But JBD/ext3 code keeps a pointer to current journal
handle in task_struct and, hence, is not reentrable.

This bug led to "Assertion failure in journal_dirty_metadata()" messages.

http://bugzilla.openvz.org/show_bug.cgi?id=115

Signed-off-by: Andrey Savochkin &lt;saw@saw.sw.com.sg&gt;
Signed-off-by: Kirill Korotaev &lt;dev@openvz.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in
ext3_symlink().  Such allocation may re-enter ext3 code from
try_to_free_pages.  But JBD/ext3 code keeps a pointer to current journal
handle in task_struct and, hence, is not reentrable.

This bug led to "Assertion failure in journal_dirty_metadata()" messages.

http://bugzilla.openvz.org/show_bug.cgi?id=115

Signed-off-by: Andrey Savochkin &lt;saw@saw.sw.com.sg&gt;
Signed-off-by: Kirill Korotaev &lt;dev@openvz.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] fix file counting</title>
<updated>2006-03-08T22:14:01+00:00</updated>
<author>
<name>Dipankar Sarma</name>
<email>dipankar@in.ibm.com</email>
</author>
<published>2006-03-08T05:55:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=529bf6be5c04f2e869d07bfdb122e9fd98ade714'/>
<id>529bf6be5c04f2e869d07bfdb122e9fd98ade714</id>
<content type='text'>
I have benchmarked this on an x86_64 NUMA system and see no significant
performance difference on kernbench.  Tested on both x86_64 and powerpc.

The way we do file struct accounting is not very suitable for batched
freeing.  For scalability reasons, file accounting was
constructor/destructor based.  This meant that nr_files was decremented
only when the object was removed from the slab cache.  This is susceptible
to slab fragmentation.  With RCU based file structure, consequent batched
freeing and a test program like Serge's, we just speed this up and end up
with a very fragmented slab -

llm22:~ # cat /proc/sys/fs/file-nr
587730  0       758844

At the same time, I see only a 2000+ objects in filp cache.  The following
patch I fixes this problem.

This patch changes the file counting by removing the filp_count_lock.
Instead we use a separate percpu counter, nr_files, for now and all
accesses to it are through get_nr_files() api.  In the sysctl handler for
nr_files, we populate files_stat.nr_files before returning to user.

Counting files as an when they are created and destroyed (as opposed to
inside slab) allows us to correctly count open files with RCU.

Signed-off-by: Dipankar Sarma &lt;dipankar@in.ibm.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@us.ibm.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I have benchmarked this on an x86_64 NUMA system and see no significant
performance difference on kernbench.  Tested on both x86_64 and powerpc.

The way we do file struct accounting is not very suitable for batched
freeing.  For scalability reasons, file accounting was
constructor/destructor based.  This meant that nr_files was decremented
only when the object was removed from the slab cache.  This is susceptible
to slab fragmentation.  With RCU based file structure, consequent batched
freeing and a test program like Serge's, we just speed this up and end up
with a very fragmented slab -

llm22:~ # cat /proc/sys/fs/file-nr
587730  0       758844

At the same time, I see only a 2000+ objects in filp cache.  The following
patch I fixes this problem.

This patch changes the file counting by removing the filp_count_lock.
Instead we use a separate percpu counter, nr_files, for now and all
accesses to it are through get_nr_files() api.  In the sysctl handler for
nr_files, we populate files_stat.nr_files before returning to user.

Counting files as an when they are created and destroyed (as opposed to
inside slab) allows us to correctly count open files with RCU.

Signed-off-by: Dipankar Sarma &lt;dipankar@in.ibm.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@us.ibm.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Mark the pipe file operations static</title>
<updated>2006-03-08T22:03:09+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@g5.osdl.org</email>
</author>
<published>2006-03-08T22:03:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a19cbd4bf258840ade3b6ee9e9256006d0644e09'/>
<id>a19cbd4bf258840ade3b6ee9e9256006d0644e09</id>
<content type='text'>
They aren't used (nor even really usable) outside of pipe.c anyway

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
They aren't used (nor even really usable) outside of pipe.c anyway

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Direct Migration V9: Avoid writeback / page_migrate() method</title>
<updated>2006-02-01T16:53:17+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-02-01T11:05:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e965f9630c651fa4249039fd4b80c9392d07a856'/>
<id>e965f9630c651fa4249039fd4b80c9392d07a856</id>
<content type='text'>
Migrate a page with buffers without requiring writeback

This introduces a new address space operation migratepage() that may be used
by a filesystem to implement its own version of page migration.

A version is provided that migrates buffers attached to pages.  Some
filesystems (ext2, ext3, xfs) are modified to utilize this feature.

The swapper address space operation are modified so that a regular
migrate_page() will occur for anonymous pages without writeback (migrate_pages
forces every anonymous page to have a swap entry).

Signed-off-by: Mike Kravetz &lt;kravetz@us.ibm.com&gt;
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Migrate a page with buffers without requiring writeback

This introduces a new address space operation migratepage() that may be used
by a filesystem to implement its own version of page migration.

A version is provided that migrates buffers attached to pages.  Some
filesystems (ext2, ext3, xfs) are modified to utilize this feature.

The swapper address space operation are modified so that a regular
migrate_page() will occur for anonymous pages without writeback (migrate_pages
forces every anonymous page to have a swap entry).

Signed-off-by: Mike Kravetz &lt;kravetz@us.ibm.com&gt;
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] vfs: *at functions: core</title>
<updated>2006-01-19T03:20:29+00:00</updated>
<author>
<name>Ulrich Drepper</name>
<email>drepper@redhat.com</email>
</author>
<published>2006-01-19T01:43:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5590ff0d5528b60153c0b4e7b771472b5a95e297'/>
<id>5590ff0d5528b60153c0b4e7b771472b5a95e297</id>
<content type='text'>
Here is a series of patches which introduce in total 13 new system calls
which take a file descriptor/filename pair instead of a single file
name.  These functions, openat etc, have been discussed on numerous
occasions.  They are needed to implement race-free filesystem traversal,
they are necessary to implement a virtual per-thread current working
directory (think multi-threaded backup software), etc.

We have in glibc today implementations of the interfaces which use the
/proc/self/fd magic.  But this code is rather expensive.  Here are some
results (similar to what Jim Meyering posted before).

The test creates a deep directory hierarchy on a tmpfs filesystem.  Then
rm -fr is used to remove all directories.  Without syscall support I get
this:

real    0m31.921s
user    0m0.688s
sys     0m31.234s

With syscall support the results are much better:

real    0m20.699s
user    0m0.536s
sys     0m20.149s

The interfaces are for obvious reasons currently not much used.  But they'll
be used.  coreutils (and Jeff's posixutils) are already using them.
Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
them.  I expect a patch to make follow soon.  Every program which is walking
the filesystem tree will benefit.

Signed-off-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Al Viro &lt;viro@ftp.linux.org.uk&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Michael Kerrisk &lt;mtk-manpages@gmx.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Here is a series of patches which introduce in total 13 new system calls
which take a file descriptor/filename pair instead of a single file
name.  These functions, openat etc, have been discussed on numerous
occasions.  They are needed to implement race-free filesystem traversal,
they are necessary to implement a virtual per-thread current working
directory (think multi-threaded backup software), etc.

We have in glibc today implementations of the interfaces which use the
/proc/self/fd magic.  But this code is rather expensive.  Here are some
results (similar to what Jim Meyering posted before).

The test creates a deep directory hierarchy on a tmpfs filesystem.  Then
rm -fr is used to remove all directories.  Without syscall support I get
this:

real    0m31.921s
user    0m0.688s
sys     0m31.234s

With syscall support the results are much better:

real    0m20.699s
user    0m0.536s
sys     0m20.149s

The interfaces are for obvious reasons currently not much used.  But they'll
be used.  coreutils (and Jeff's posixutils) are already using them.
Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
them.  I expect a patch to make follow soon.  Every program which is walking
the filesystem tree will benefit.

Signed-off-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Al Viro &lt;viro@ftp.linux.org.uk&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Michael Kerrisk &lt;mtk-manpages@gmx.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] add /sys/fs</title>
<updated>2006-01-17T07:15:29+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>miklos@szeredi.hu</email>
</author>
<published>2006-01-17T06:14:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f87fd4c2a0c4f3baad28057360b36a59591ef751'/>
<id>f87fd4c2a0c4f3baad28057360b36a59591ef751</id>
<content type='text'>
This patch adds an empty /sys/fs, which filesystems can use.

Signed-off-by: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Greg KH &lt;greg@kroah.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds an empty /sys/fs, which filesystems can use.

Signed-off-by: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Greg KH &lt;greg@kroah.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] convert /proc/devices to use seq_file interface</title>
<updated>2006-01-15T02:25:19+00:00</updated>
<author>
<name>Neil Horman</name>
<email>nhorman@redhat.com</email>
</author>
<published>2006-01-14T21:20:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7170be5f586b59bdcdab082778a5d9203ba7b667'/>
<id>7170be5f586b59bdcdab082778a5d9203ba7b667</id>
<content type='text'>
A Christoph suggested that the /proc/devices file be converted to use the
seq_file interface.  This patch does that.

I've obxerved one or two installation that had sufficiently large sans that
they overran the 4k limit on /proc/devices.

Signed-off-by: Neil Horman &lt;nhorman@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A Christoph suggested that the /proc/devices file be converted to use the
seq_file interface.  This patch does that.

I've obxerved one or two installation that had sufficiently large sans that
they overran the 4k limit on /proc/devices.

Signed-off-by: Neil Horman &lt;nhorman@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] per-mountpoint noatime/nodiratime</title>
<updated>2006-01-10T16:01:34+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2006-01-10T04:52:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fc33a7bb9c6dd8f6e4a014976200f8fdabb3a45c'/>
<id>fc33a7bb9c6dd8f6e4a014976200f8fdabb3a45c</id>
<content type='text'>
Turn noatime and nodiratime into per-mount instead of per-sb flags.

After all the preparations this is a rather trivial patch.  The mount code
needs to treat the two options as per-mount instead of per-superblock, and
touch_atime needs to be changed to check the new MNT_ flags in addition to
the MS_ flags that are kept for filesystems that are always
noatime/nodiratime but not user settable anymore.  Besides that core code
only nfs needed an update because it's leaving atime updates to the server
and thus sets the S_NOATIME flag on every inode, but needs to know whether
it's a real noatime mount for an getattr optimization.

While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were
only used by touch_atime.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Turn noatime and nodiratime into per-mount instead of per-sb flags.

After all the preparations this is a rather trivial patch.  The mount code
needs to treat the two options as per-mount instead of per-superblock, and
touch_atime needs to be changed to check the new MNT_ flags in addition to
the MS_ flags that are kept for filesystems that are always
noatime/nodiratime but not user settable anymore.  Besides that core code
only nfs needed an update because it's leaving atime updates to the server
and thus sets the S_NOATIME flag on every inode, but needs to know whether
it's a real noatime mount for an getattr optimization.

While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were
only used by touch_atime.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] remove update_atime</title>
<updated>2006-01-10T16:01:31+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2006-01-10T04:52:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=869243a0f6143f76e7c847e707eee6ece9cbf821'/>
<id>869243a0f6143f76e7c847e707eee6ece9cbf821</id>
<content type='text'>
All callers use touch_atime now which takes a vfsmount and allows us to
implement per-mount noatime.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All callers use touch_atime now which takes a vfsmount and allows us to
implement per-mount noatime.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
