<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/ceph, branch v3.9.11</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>ceph: fix sleeping function called from invalid context.</title>
<updated>2013-07-13T18:39:17+00:00</updated>
<author>
<name>majianpeng</name>
<email>majianpeng@gmail.com</email>
</author>
<published>2013-06-19T06:58:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4e94a75e20addc7086627ceab0d1c90f3b27a205'/>
<id>4e94a75e20addc7086627ceab0d1c90f3b27a205</id>
<content type='text'>
commit a1dc1937337a93e699eaa56968b7de6e1a9e77cf upstream.

[ 1121.231883] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
[ 1121.231935] in_atomic(): 1, irqs_disabled(): 0, pid: 9831, name: mv
[ 1121.231971] 1 lock held by mv/9831:
[ 1121.231973]  #0:  (&amp;(&amp;ci-&gt;i_ceph_lock)-&gt;rlock){+.+...},at:[&lt;ffffffffa02bbd38&gt;] ceph_getxattr+0x58/0x1d0 [ceph]
[ 1121.231998] CPU: 3 PID: 9831 Comm: mv Not tainted 3.10.0-rc6+ #215
[ 1121.232000] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
[ 1121.232027]  ffff88006d355a80 ffff880092f69ce0 ffffffff8168348c ffff880092f69cf8
[ 1121.232045]  ffffffff81070435 ffff88006d355a20 ffff880092f69d20 ffffffff816899ba
[ 1121.232052]  0000000300000004 ffff8800b76911d0 ffff88006d355a20 ffff880092f69d68
[ 1121.232056] Call Trace:
[ 1121.232062]  [&lt;ffffffff8168348c&gt;] dump_stack+0x19/0x1b
[ 1121.232067]  [&lt;ffffffff81070435&gt;] __might_sleep+0xe5/0x110
[ 1121.232071]  [&lt;ffffffff816899ba&gt;] down_read+0x2a/0x98
[ 1121.232080]  [&lt;ffffffffa02baf70&gt;] ceph_vxattrcb_layout+0x60/0xf0 [ceph]
[ 1121.232088]  [&lt;ffffffffa02bbd7f&gt;] ceph_getxattr+0x9f/0x1d0 [ceph]
[ 1121.232093]  [&lt;ffffffff81188d28&gt;] vfs_getxattr+0xa8/0xd0
[ 1121.232097]  [&lt;ffffffff8118900b&gt;] getxattr+0xab/0x1c0
[ 1121.232100]  [&lt;ffffffff811704f2&gt;] ? final_putname+0x22/0x50
[ 1121.232104]  [&lt;ffffffff81155f80&gt;] ? kmem_cache_free+0xb0/0x260
[ 1121.232107]  [&lt;ffffffff811704f2&gt;] ? final_putname+0x22/0x50
[ 1121.232110]  [&lt;ffffffff8109e63d&gt;] ? trace_hardirqs_on+0xd/0x10
[ 1121.232114]  [&lt;ffffffff816957a7&gt;] ? sysret_check+0x1b/0x56
[ 1121.232120]  [&lt;ffffffff81189c9c&gt;] SyS_fgetxattr+0x6c/0xc0
[ 1121.232125]  [&lt;ffffffff81695782&gt;] system_call_fastpath+0x16/0x1b
[ 1121.232129] BUG: scheduling while atomic: mv/9831/0x10000002
[ 1121.232154] 1 lock held by mv/9831:
[ 1121.232156]  #0:  (&amp;(&amp;ci-&gt;i_ceph_lock)-&gt;rlock){+.+...}, at:
[&lt;ffffffffa02bbd38&gt;] ceph_getxattr+0x58/0x1d0 [ceph]

I think move the ci-&gt;i_ceph_lock down is safe because we can't free
ceph_inode_info at there.

Signed-off-by: Jianpeng Ma &lt;majianpeng@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a1dc1937337a93e699eaa56968b7de6e1a9e77cf upstream.

[ 1121.231883] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
[ 1121.231935] in_atomic(): 1, irqs_disabled(): 0, pid: 9831, name: mv
[ 1121.231971] 1 lock held by mv/9831:
[ 1121.231973]  #0:  (&amp;(&amp;ci-&gt;i_ceph_lock)-&gt;rlock){+.+...},at:[&lt;ffffffffa02bbd38&gt;] ceph_getxattr+0x58/0x1d0 [ceph]
[ 1121.231998] CPU: 3 PID: 9831 Comm: mv Not tainted 3.10.0-rc6+ #215
[ 1121.232000] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
[ 1121.232027]  ffff88006d355a80 ffff880092f69ce0 ffffffff8168348c ffff880092f69cf8
[ 1121.232045]  ffffffff81070435 ffff88006d355a20 ffff880092f69d20 ffffffff816899ba
[ 1121.232052]  0000000300000004 ffff8800b76911d0 ffff88006d355a20 ffff880092f69d68
[ 1121.232056] Call Trace:
[ 1121.232062]  [&lt;ffffffff8168348c&gt;] dump_stack+0x19/0x1b
[ 1121.232067]  [&lt;ffffffff81070435&gt;] __might_sleep+0xe5/0x110
[ 1121.232071]  [&lt;ffffffff816899ba&gt;] down_read+0x2a/0x98
[ 1121.232080]  [&lt;ffffffffa02baf70&gt;] ceph_vxattrcb_layout+0x60/0xf0 [ceph]
[ 1121.232088]  [&lt;ffffffffa02bbd7f&gt;] ceph_getxattr+0x9f/0x1d0 [ceph]
[ 1121.232093]  [&lt;ffffffff81188d28&gt;] vfs_getxattr+0xa8/0xd0
[ 1121.232097]  [&lt;ffffffff8118900b&gt;] getxattr+0xab/0x1c0
[ 1121.232100]  [&lt;ffffffff811704f2&gt;] ? final_putname+0x22/0x50
[ 1121.232104]  [&lt;ffffffff81155f80&gt;] ? kmem_cache_free+0xb0/0x260
[ 1121.232107]  [&lt;ffffffff811704f2&gt;] ? final_putname+0x22/0x50
[ 1121.232110]  [&lt;ffffffff8109e63d&gt;] ? trace_hardirqs_on+0xd/0x10
[ 1121.232114]  [&lt;ffffffff816957a7&gt;] ? sysret_check+0x1b/0x56
[ 1121.232120]  [&lt;ffffffff81189c9c&gt;] SyS_fgetxattr+0x6c/0xc0
[ 1121.232125]  [&lt;ffffffff81695782&gt;] system_call_fastpath+0x16/0x1b
[ 1121.232129] BUG: scheduling while atomic: mv/9831/0x10000002
[ 1121.232154] 1 lock held by mv/9831:
[ 1121.232156]  #0:  (&amp;(&amp;ci-&gt;i_ceph_lock)-&gt;rlock){+.+...}, at:
[&lt;ffffffffa02bbd38&gt;] ceph_getxattr+0x58/0x1d0 [ceph]

I think move the ci-&gt;i_ceph_lock down is safe because we can't free
ceph_inode_info at there.

Signed-off-by: Jianpeng Ma &lt;majianpeng@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: wrap auth ops in wrapper functions</title>
<updated>2013-06-20T19:01:32+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2013-03-25T17:26:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1a14fad287936577daf0533caa94f5c5038842e7'/>
<id>1a14fad287936577daf0533caa94f5c5038842e7</id>
<content type='text'>
commit 27859f9773e4a0b2042435b13400ee2c891a61f4 upstream.

Use wrapper functions that check whether the auth op exists so that callers
do not need a bunch of conditional checks.  Simplifies the external
interface.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 27859f9773e4a0b2042435b13400ee2c891a61f4 upstream.

Use wrapper functions that check whether the auth op exists so that callers
do not need a bunch of conditional checks.  Simplifies the external
interface.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: add update_authorizer auth method</title>
<updated>2013-06-20T19:01:32+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2013-03-25T17:26:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d2c7223497cf8228416c70e3f4238ddd6c5bdf3c'/>
<id>d2c7223497cf8228416c70e3f4238ddd6c5bdf3c</id>
<content type='text'>
commit 0bed9b5c523d577378b6f83eab5835fe30c27208 upstream.

Currently the messenger calls out to a get_authorizer con op, which will
create a new authorizer if it doesn't yet have one.  In the meantime, when
we rotate our service keys, the authorizer doesn't get updated.  Eventually
it will be rejected by the server on a new connection attempt and get
invalidated, and we will then rebuild a new authorizer, but this is not
ideal.

Instead, if we do have an authorizer, call a new update_authorizer op that
will verify that the current authorizer is using the latest secret.  If it
is not, we will build a new one that does.  This avoids the transient
failure.

This fixes one of the sorry sequence of events for bug

	http://tracker.ceph.com/issues/4282

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0bed9b5c523d577378b6f83eab5835fe30c27208 upstream.

Currently the messenger calls out to a get_authorizer con op, which will
create a new authorizer if it doesn't yet have one.  In the meantime, when
we rotate our service keys, the authorizer doesn't get updated.  Eventually
it will be rejected by the server on a new connection attempt and get
invalidated, and we will then rebuild a new authorizer, but this is not
ideal.

Instead, if we do have an authorizer, call a new update_authorizer op that
will verify that the current authorizer is using the latest secret.  If it
is not, we will build a new one that does.  This avoids the transient
failure.

This fixes one of the sorry sequence of events for bug

	http://tracker.ceph.com/issues/4282

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: ceph_pagelist_append might sleep while atomic</title>
<updated>2013-06-20T19:01:27+00:00</updated>
<author>
<name>Jim Schutt</name>
<email>jaschut@sandia.gov</email>
</author>
<published>2013-05-15T18:03:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4a798296068abd043608c10ccb69209a7c2f0579'/>
<id>4a798296068abd043608c10ccb69209a7c2f0579</id>
<content type='text'>
commit 39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40 upstream.

Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [&lt;ffffffff81076f4c&gt;] __might_sleep+0xfc/0x110
[13439.384353]  [&lt;ffffffffa03f4ce0&gt;] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [&lt;ffffffffa0448fe9&gt;] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [&lt;ffffffff814ee849&gt;] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [&lt;ffffffff811cadf5&gt;] ? lock_flocks+0x15/0x20
[13439.409277]  [&lt;ffffffffa045e2af&gt;] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [&lt;ffffffff81196748&gt;] ? igrab+0x28/0x70
[13439.420610]  [&lt;ffffffffa045e9f8&gt;] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [&lt;ffffffffa045ea25&gt;] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [&lt;ffffffffa045de90&gt;] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [&lt;ffffffffa0462888&gt;] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [&lt;ffffffffa0464542&gt;] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [&lt;ffffffffa0462e42&gt;] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [&lt;ffffffffa04631ad&gt;] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [&lt;ffffffff814ebc7e&gt;] ? mutex_unlock+0xe/0x10
[13439.473934]  [&lt;ffffffffa043c612&gt;] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [&lt;ffffffffa03f6c2c&gt;] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [&lt;ffffffffa03eec3d&gt;] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [&lt;ffffffffa03f1498&gt;] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40 upstream.

Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [&lt;ffffffff81076f4c&gt;] __might_sleep+0xfc/0x110
[13439.384353]  [&lt;ffffffffa03f4ce0&gt;] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [&lt;ffffffffa0448fe9&gt;] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [&lt;ffffffff814ee849&gt;] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [&lt;ffffffff811cadf5&gt;] ? lock_flocks+0x15/0x20
[13439.409277]  [&lt;ffffffffa045e2af&gt;] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [&lt;ffffffff81196748&gt;] ? igrab+0x28/0x70
[13439.420610]  [&lt;ffffffffa045e9f8&gt;] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [&lt;ffffffffa045ea25&gt;] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [&lt;ffffffffa045de90&gt;] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [&lt;ffffffffa0462888&gt;] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [&lt;ffffffffa0464542&gt;] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [&lt;ffffffffa0462e42&gt;] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [&lt;ffffffffa04631ad&gt;] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [&lt;ffffffff814ebc7e&gt;] ? mutex_unlock+0xe/0x10
[13439.473934]  [&lt;ffffffffa043c612&gt;] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [&lt;ffffffffa03f6c2c&gt;] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [&lt;ffffffffa03eec3d&gt;] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [&lt;ffffffffa03f1498&gt;] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: add cpu_to_le32() calls when encoding a reconnect capability</title>
<updated>2013-06-20T19:01:27+00:00</updated>
<author>
<name>Jim Schutt</name>
<email>jaschut@sandia.gov</email>
</author>
<published>2013-05-15T18:03:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7662e04a958484d22fa9f18c7c6c35e0230d9207'/>
<id>7662e04a958484d22fa9f18c7c6c35e0230d9207</id>
<content type='text'>
commit c420276a532a10ef59849adc2681f45306166b89 upstream.

In his review, Alex Elder mentioned that he hadn't checked that
num_fcntl_locks and num_flock_locks were properly decoded on the
server side, from a le32 over-the-wire type to a cpu type.
I checked, and AFAICS it is done; those interested can consult
    Locker::_do_cap_update()
in src/mds/Locker.cc and src/include/encoding.h in the Ceph server
code (git://github.com/ceph/ceph).

I also checked the server side for flock_len decoding, and I believe
that also happens correctly, by virtue of having been declared
__le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.

Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c420276a532a10ef59849adc2681f45306166b89 upstream.

In his review, Alex Elder mentioned that he hadn't checked that
num_fcntl_locks and num_flock_locks were properly decoded on the
server side, from a le32 over-the-wire type to a cpu type.
I checked, and AFAICS it is done; those interested can consult
    Locker::_do_cap_update()
in src/mds/Locker.cc and src/include/encoding.h in the Ceph server
code (git://github.com/ceph/ceph).

I also checked the server side for flock_len decoding, and I believe
that also happens correctly, by virtue of having been declared
__le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.

Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>fs: Limit sys_mount to only request filesystem modules.</title>
<updated>2013-03-04T03:36:31+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2013-03-03T03:39:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7f78e0351394052e1a6293e175825eb5c7869507'/>
<id>7f78e0351394052e1a6293e175825eb5c7869507</id>
<content type='text'>
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Reported-by: Kees Cook &lt;keescook@google.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Reported-by: Kees Cook &lt;keescook@google.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client</title>
<updated>2013-03-01T01:43:09+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-03-01T01:43:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1cf0209c431fa7790253c532039d53b0773193aa'/>
<id>1cf0209c431fa7790253c532039d53b0773193aa</id>
<content type='text'>
Pull Ceph updates from Sage Weil:
 "A few groups of patches here.  Alex has been hard at work improving
  the RBD code, layout groundwork for understanding the new formats and
  doing layering.  Most of the infrastructure is now in place for the
  final bits that will come with the next window.

  There are a few changes to the data layout.  Jim Schutt's patch fixes
  some non-ideal CRUSH behavior, and a set of patches from me updates
  the client to speak a newer version of the protocol and implement an
  improved hashing strategy across storage nodes (when the server side
  supports it too).

  A pair of patches from Sam Lang fix the atomicity of open+create
  operations.  Several patches from Yan, Zheng fix various mds/client
  issues that turned up during multi-mds torture tests.

  A final set of patches expose file layouts via virtual xattrs, and
  allow the policies to be set on directories via xattrs as well
  (avoiding the awkward ioctl interface and providing a consistent
  interface for both kernel mount and ceph-fuse users)."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
  libceph: add support for HASHPSPOOL pool flag
  libceph: update osd request/reply encoding
  libceph: calculate placement based on the internal data types
  ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
  ceph: update "ceph_features.h"
  libceph: decode into cpu-native ceph_pg type
  libceph: rename ceph_pg -&gt; ceph_pg_v1
  rbd: pass length, not op for osd completions
  rbd: move rbd_osd_trivial_callback()
  libceph: use a do..while loop in con_work()
  libceph: use a flag to indicate a fault has occurred
  libceph: separate non-locked fault handling
  libceph: encapsulate connection backoff
  libceph: eliminate sparse warnings
  ceph: eliminate sparse warnings in fs code
  rbd: eliminate sparse warnings
  libceph: define connection flag helpers
  rbd: normalize dout() calls
  rbd: barriers are hard
  rbd: ignore zero-length requests
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull Ceph updates from Sage Weil:
 "A few groups of patches here.  Alex has been hard at work improving
  the RBD code, layout groundwork for understanding the new formats and
  doing layering.  Most of the infrastructure is now in place for the
  final bits that will come with the next window.

  There are a few changes to the data layout.  Jim Schutt's patch fixes
  some non-ideal CRUSH behavior, and a set of patches from me updates
  the client to speak a newer version of the protocol and implement an
  improved hashing strategy across storage nodes (when the server side
  supports it too).

  A pair of patches from Sam Lang fix the atomicity of open+create
  operations.  Several patches from Yan, Zheng fix various mds/client
  issues that turned up during multi-mds torture tests.

  A final set of patches expose file layouts via virtual xattrs, and
  allow the policies to be set on directories via xattrs as well
  (avoiding the awkward ioctl interface and providing a consistent
  interface for both kernel mount and ceph-fuse users)."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
  libceph: add support for HASHPSPOOL pool flag
  libceph: update osd request/reply encoding
  libceph: calculate placement based on the internal data types
  ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
  ceph: update "ceph_features.h"
  libceph: decode into cpu-native ceph_pg type
  libceph: rename ceph_pg -&gt; ceph_pg_v1
  rbd: pass length, not op for osd completions
  rbd: move rbd_osd_trivial_callback()
  libceph: use a do..while loop in con_work()
  libceph: use a flag to indicate a fault has occurred
  libceph: separate non-locked fault handling
  libceph: encapsulate connection backoff
  libceph: eliminate sparse warnings
  ceph: eliminate sparse warnings in fs code
  rbd: eliminate sparse warnings
  libceph: define connection flag helpers
  rbd: normalize dout() calls
  rbd: barriers are hard
  rbd: ignore zero-length requests
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2013-02-27T04:16:07+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-02-27T04:16:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d895cb1af15c04c522a25c79cc429076987c089b'/>
<id>d895cb1af15c04c522a25c79cc429076987c089b</id>
<content type='text'>
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing -&gt;d_name/-&gt;d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has -&gt;d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both -&gt;f_pos and -&gt;f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing -&gt;d_name/-&gt;d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has -&gt;d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both -&gt;f_pos and -&gt;f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: update osd request/reply encoding</title>
<updated>2013-02-26T23:02:50+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2013-02-26T00:11:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1b83bef24c6746a146d39915a18fb5425f2facb0'/>
<id>1b83bef24c6746a146d39915a18fb5425f2facb0</id>
<content type='text'>
Use the new version of the encoding for osd requests and replies.  In the
process, update the way we are tracking request ops and reply lengths and
results in the struct ceph_osd_request.  Update the rbd and fs/ceph users
appropriately.

The main changes are:
 - we keep pointers into the request memory for fields we need to update
   each time the request is sent out over the wire
 - we keep information about the result in an array in the request struct
   where the users can easily get at it.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use the new version of the encoding for osd requests and replies.  In the
process, update the way we are tracking request ops and reply lengths and
results in the struct ceph_osd_request.  Update the rbd and fs/ceph users
appropriately.

The main changes are:
 - we keep pointers into the request memory for fields we need to update
   each time the request is sent out over the wire
 - we keep information about the result in an array in the request struct
   where the users can easily get at it.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: calculate placement based on the internal data types</title>
<updated>2013-02-26T23:02:37+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2013-02-26T00:13:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2169aea649c08374bec7d220a3b8f64712275356'/>
<id>2169aea649c08374bec7d220a3b8f64712275356</id>
<content type='text'>
Instead of using the old ceph_object_layout struct, update our internal
ceph_calc_object_layout method to use the ceph_pg type.  This allows us to
pass the full 32-bit precision of the pgid.seed to the callers.  It also
allows some callers to avoid reaching into the request structures for the
struct ceph_object_layout fields.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of using the old ceph_object_layout struct, update our internal
ceph_calc_object_layout method to use the ceph_pg type.  This allows us to
pass the full 32-bit precision of the pgid.seed to the callers.  It also
allows some callers to avoid reaching into the request structures for the
struct ceph_object_layout fields.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
