<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/xattr.h, branch v6.0-rc1</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>acl: move idmapped mount fixup into vfs_{g,s}etxattr()</title>
<updated>2022-07-15T20:08:59+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2022-07-06T16:30:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0c5fd887d2bb47aa37aa9fb1eb1d1d2abac62972'/>
<id>0c5fd887d2bb47aa37aa9fb1eb1d1d2abac62972</id>
<content type='text'>
This cycle we added support for mounting overlayfs on top of idmapped mounts.
Recently I've started looking into potential corner cases when trying to add
additional tests and I noticed that reporting for POSIX ACLs is currently wrong
when using idmapped layers with overlayfs mounted on top of it.

I'm going to give a rather detailed explanation to both the origin of the
problem and the solution.

Let's assume the user creates the following directory layout and they have a
rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would
expect files on your host system to be owned. For example, ~/.bashrc for your
regular user would be owned by 1000:1000 and /root/.bashrc would be owned by
0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs
filesystem.

The user chooses to set POSIX ACLs using the setfacl binary granting the user
with uid 4 read, write, and execute permissions for their .bashrc file:

        setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

Now they to expose the whole rootfs to a container using an idmapped mount. So
they first create:

        mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap}
        mkdir -pv /vol/contpool/ctrover/{over,work}
        chown 10000000:10000000 /vol/contpool/ctrover/{over,work}

The user now creates an idmapped mount for the rootfs:

        mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \
                                      /var/lib/lxc/c2/rootfs \
                                      /vol/contpool/lowermap

This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at
/vol/contpool/lowermap/home/ubuntu/.bashrc.

Assume the user wants to expose these idmapped mounts through an overlayfs
mount to a container.

       mount -t overlay overlay                      \
             -o lowerdir=/vol/contpool/lowermap,     \
                upperdir=/vol/contpool/overmap/over, \
                workdir=/vol/contpool/overmap/work   \
             /vol/contpool/merge

The user can do this in two ways:

(1) Mount overlayfs in the initial user namespace and expose it to the
    container.
(2) Mount overlayfs on top of the idmapped mounts inside of the container's
    user namespace.

Let's assume the user chooses the (1) option and mounts overlayfs on the host
and then changes into a container which uses the idmapping 0:10000000:65536
which is the same used for the two idmapped mounts.

Now the user tries to retrieve the POSIX ACLs using the getfacl command

        getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc

and to their surprise they see:

        # file: vol/contpool/merge/home/ubuntu/.bashrc
        # owner: 1000
        # group: 1000
        user::rw-
        user:4294967295:rwx
        group::r--
        mask::rwx
        other::r--

indicating the the uid wasn't correctly translated according to the idmapped
mount. The problem is how we currently translate POSIX ACLs. Let's inspect the
callchain in this example:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  -&gt; __vfs_getxattr()
                  |     -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |        -&gt; ovl_xattr_get()
                  |           -&gt; vfs_getxattr()
                  |              -&gt; __vfs_getxattr()
                  |                 -&gt; handler-&gt;get() /* lower filesystem callback */
                  |&gt; posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&amp;init_user_ns, 4);
                              4 = mapped_kuid_fs(&amp;init_user_ns /* no idmapped mount */, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

If the user chooses to use option (2) and mounts overlayfs on top of idmapped
mounts inside the container things don't look that much better:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:10000000:65536

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  -&gt; __vfs_getxattr()
                  |     -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |        -&gt; ovl_xattr_get()
                  |           -&gt; vfs_getxattr()
                  |              -&gt; __vfs_getxattr()
                  |                 -&gt; handler-&gt;get() /* lower filesystem callback */
                  |&gt; posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&amp;init_user_ns, 4);
                              4 = mapped_kuid_fs(&amp;init_user_ns, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

As is easily seen the problem arises because the idmapping of the lower mount
isn't taken into account as all of this happens in do_gexattr(). But
do_getxattr() is always called on an overlayfs mount and inode and thus cannot
possible take the idmapping of the lower layers into account.

This problem is similar for fscaps but there the translation happens as part of
vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain:

        setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

The expected outcome here is that we'll receive the cap_net_raw capability as
we are able to map the uid associated with the fscap to 0 within our container.
IOW, we want to see 0 as the result of the idmapping translations.

If the user chooses option (1) we get the following callchain for fscaps:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                   -&gt; vfs_getxattr()
                      -&gt; xattr_getsecurity()
                         -&gt; security_inode_getsecurity()                                       ________________________________
                            -&gt; cap_inode_getsecurity()                                         |                              |
                               {                                                               V                              |
                                        10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000);                     |
                                        10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                  |
                                               /* Expected result is 0 and thus that we own the fscap. */                     |
                                               0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);            |
                               }                                                                                              |
                               -&gt; vfs_getxattr_alloc()                                                                        |
                                  -&gt; handler-&gt;get == ovl_other_xattr_get()                                                    |
                                     -&gt; vfs_getxattr()                                                                        |
                                        -&gt; xattr_getsecurity()                                                                |
                                           -&gt; security_inode_getsecurity()                                                    |
                                              -&gt; cap_inode_getsecurity()                                                      |
                                                 {                                                                            |
                                                                0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);               |
                                                         10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); |
                                                         10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000);    |
                                                         |____________________________________________________________________|
                                                 }
                                                 -&gt; vfs_getxattr_alloc()
                                                    -&gt; handler-&gt;get == /* lower filesystem callback */

And if the user chooses option (2) we get:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:10000000:65536

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                   -&gt; vfs_getxattr()
                      -&gt; xattr_getsecurity()
                         -&gt; security_inode_getsecurity()                                                _______________________________
                            -&gt; cap_inode_getsecurity()                                                  |                             |
                               {                                                                        V                             |
                                       10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0);                           |
                                       10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                           |
                                               /* Expected result is 0 and thus that we own the fscap. */                             |
                                              0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);                     |
                               }                                                                                                      |
                               -&gt; vfs_getxattr_alloc()                                                                                |
                                  -&gt; handler-&gt;get == ovl_other_xattr_get()                                                            |
                                    |-&gt; vfs_getxattr()                                                                                |
                                        -&gt; xattr_getsecurity()                                                                        |
                                           -&gt; security_inode_getsecurity()                                                            |
                                              -&gt; cap_inode_getsecurity()                                                              |
                                                 {                                                                                    |
                                                                 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);                      |
                                                          10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0);        |
                                                                 0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); |
                                                                 |____________________________________________________________________|
                                                 }
                                                 -&gt; vfs_getxattr_alloc()
                                                    -&gt; handler-&gt;get == /* lower filesystem callback */

We can see how the translation happens correctly in those cases as the
conversion happens within the vfs_getxattr() helper.

For POSIX ACLs we need to do something similar. However, in contrast to fscaps
we cannot apply the fix directly to the kernel internal posix acl data
structure as this would alter the cached values and would also require a rework
of how we currently deal with POSIX ACLs in general which almost never take the
filesystem idmapping into account (the noteable exception being FUSE but even
there the implementation is special) and instead retrieve the raw values based
on the initial idmapping.

The correct values are then generated right before returning to userspace. The
fix for this is to move taking the mount's idmapping into account directly in
vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user().

To this end we split out two small and unexported helpers
posix_acl_getxattr_idmapped_mnt() and posix_acl_setxattr_idmapped_mnt(). The
former to be called in vfs_getxattr() and the latter to be called in
vfs_setxattr().

Let's go back to the original example. Assume the user chose option (1) and
mounted overlayfs on top of idmapped mounts on the host:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  |&gt; __vfs_getxattr()
                  |  |  -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |  |     -&gt; ovl_xattr_get()
                  |  |        -&gt; vfs_getxattr()
                  |  |           |&gt; __vfs_getxattr()
                  |  |           |  -&gt; handler-&gt;get() /* lower filesystem callback */
                  |  |           |&gt; posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&amp;init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&amp;init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |&gt; posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           |
                  |                                                 V
                  |             10000004 = make_kuid(&amp;init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&amp;init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(&amp;init_user_ns, 10000004);
                  |     }       |_________________________________________________
                  |                                                              |
                  |                                                              |
                  |&gt; posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004);
                     }

And similarly if the user chooses option (1) and mounted overayfs on top of
idmapped mounts inside the container:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:10000000:65536

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  |&gt; __vfs_getxattr()
                  |  |  -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |  |     -&gt; ovl_xattr_get()
                  |  |        -&gt; vfs_getxattr()
                  |  |           |&gt; __vfs_getxattr()
                  |  |           |  -&gt; handler-&gt;get() /* lower filesystem callback */
                  |  |           |&gt; posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&amp;init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&amp;init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |&gt; posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           V
                  |             10000004 = make_kuid(&amp;init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&amp;init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(0(&amp;init_user_ns, 10000004);
                  |             |_________________________________________________
                  |     }                                                        |
                  |                                                              |
                  |&gt; posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004);
                     }

The last remaining problem we need to fix here is ovl_get_acl(). During
ovl_permission() overlayfs will call:

        ovl_permission()
        -&gt; generic_permission()
           -&gt; acl_permission_check()
              -&gt; check_acl()
                 -&gt; get_acl()
                    -&gt; inode-&gt;i_op-&gt;get_acl() == ovl_get_acl()
                        &gt; get_acl() /* on the underlying filesystem)
                          -&gt;inode-&gt;i_op-&gt;get_acl() == /*lower filesystem callback */
                 -&gt; posix_acl_permission()

passing through the get_acl request to the underlying filesystem. This will
retrieve the acls stored in the lower filesystem without taking the idmapping
of the underlying mount into account as this would mean altering the cached
values for the lower filesystem. So we block using ACLs for now until we
decided on a nice way to fix this. Note this limitation both in the
documentation and in the code.

The most straightforward solution would be to have ovl_get_acl() simply
duplicate the ACLs, update the values according to the idmapped mount and
return it to acl_permission_check() so it can be used in posix_acl_permission()
forgetting them afterwards. This is a bit heavy handed but fairly
straightforward otherwise.

Link: https://github.com/brauner/mount-idmapped/issues/9
Link: https://lore.kernel.org/r/20220708090134.385160-2-brauner@kernel.org
Cc: Seth Forshee &lt;sforshee@digitalocean.com&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Cc: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee &lt;sforshee@digitalocean.com&gt;
Signed-off-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This cycle we added support for mounting overlayfs on top of idmapped mounts.
Recently I've started looking into potential corner cases when trying to add
additional tests and I noticed that reporting for POSIX ACLs is currently wrong
when using idmapped layers with overlayfs mounted on top of it.

I'm going to give a rather detailed explanation to both the origin of the
problem and the solution.

Let's assume the user creates the following directory layout and they have a
rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would
expect files on your host system to be owned. For example, ~/.bashrc for your
regular user would be owned by 1000:1000 and /root/.bashrc would be owned by
0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs
filesystem.

The user chooses to set POSIX ACLs using the setfacl binary granting the user
with uid 4 read, write, and execute permissions for their .bashrc file:

        setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

Now they to expose the whole rootfs to a container using an idmapped mount. So
they first create:

        mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap}
        mkdir -pv /vol/contpool/ctrover/{over,work}
        chown 10000000:10000000 /vol/contpool/ctrover/{over,work}

The user now creates an idmapped mount for the rootfs:

        mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \
                                      /var/lib/lxc/c2/rootfs \
                                      /vol/contpool/lowermap

This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at
/vol/contpool/lowermap/home/ubuntu/.bashrc.

Assume the user wants to expose these idmapped mounts through an overlayfs
mount to a container.

       mount -t overlay overlay                      \
             -o lowerdir=/vol/contpool/lowermap,     \
                upperdir=/vol/contpool/overmap/over, \
                workdir=/vol/contpool/overmap/work   \
             /vol/contpool/merge

The user can do this in two ways:

(1) Mount overlayfs in the initial user namespace and expose it to the
    container.
(2) Mount overlayfs on top of the idmapped mounts inside of the container's
    user namespace.

Let's assume the user chooses the (1) option and mounts overlayfs on the host
and then changes into a container which uses the idmapping 0:10000000:65536
which is the same used for the two idmapped mounts.

Now the user tries to retrieve the POSIX ACLs using the getfacl command

        getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc

and to their surprise they see:

        # file: vol/contpool/merge/home/ubuntu/.bashrc
        # owner: 1000
        # group: 1000
        user::rw-
        user:4294967295:rwx
        group::r--
        mask::rwx
        other::r--

indicating the the uid wasn't correctly translated according to the idmapped
mount. The problem is how we currently translate POSIX ACLs. Let's inspect the
callchain in this example:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  -&gt; __vfs_getxattr()
                  |     -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |        -&gt; ovl_xattr_get()
                  |           -&gt; vfs_getxattr()
                  |              -&gt; __vfs_getxattr()
                  |                 -&gt; handler-&gt;get() /* lower filesystem callback */
                  |&gt; posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&amp;init_user_ns, 4);
                              4 = mapped_kuid_fs(&amp;init_user_ns /* no idmapped mount */, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

If the user chooses to use option (2) and mounts overlayfs on top of idmapped
mounts inside the container things don't look that much better:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:10000000:65536

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  -&gt; __vfs_getxattr()
                  |     -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |        -&gt; ovl_xattr_get()
                  |           -&gt; vfs_getxattr()
                  |              -&gt; __vfs_getxattr()
                  |                 -&gt; handler-&gt;get() /* lower filesystem callback */
                  |&gt; posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&amp;init_user_ns, 4);
                              4 = mapped_kuid_fs(&amp;init_user_ns, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

As is easily seen the problem arises because the idmapping of the lower mount
isn't taken into account as all of this happens in do_gexattr(). But
do_getxattr() is always called on an overlayfs mount and inode and thus cannot
possible take the idmapping of the lower layers into account.

This problem is similar for fscaps but there the translation happens as part of
vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain:

        setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

The expected outcome here is that we'll receive the cap_net_raw capability as
we are able to map the uid associated with the fscap to 0 within our container.
IOW, we want to see 0 as the result of the idmapping translations.

If the user chooses option (1) we get the following callchain for fscaps:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                   -&gt; vfs_getxattr()
                      -&gt; xattr_getsecurity()
                         -&gt; security_inode_getsecurity()                                       ________________________________
                            -&gt; cap_inode_getsecurity()                                         |                              |
                               {                                                               V                              |
                                        10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000);                     |
                                        10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                  |
                                               /* Expected result is 0 and thus that we own the fscap. */                     |
                                               0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);            |
                               }                                                                                              |
                               -&gt; vfs_getxattr_alloc()                                                                        |
                                  -&gt; handler-&gt;get == ovl_other_xattr_get()                                                    |
                                     -&gt; vfs_getxattr()                                                                        |
                                        -&gt; xattr_getsecurity()                                                                |
                                           -&gt; security_inode_getsecurity()                                                    |
                                              -&gt; cap_inode_getsecurity()                                                      |
                                                 {                                                                            |
                                                                0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);               |
                                                         10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); |
                                                         10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000);    |
                                                         |____________________________________________________________________|
                                                 }
                                                 -&gt; vfs_getxattr_alloc()
                                                    -&gt; handler-&gt;get == /* lower filesystem callback */

And if the user chooses option (2) we get:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:10000000:65536

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                   -&gt; vfs_getxattr()
                      -&gt; xattr_getsecurity()
                         -&gt; security_inode_getsecurity()                                                _______________________________
                            -&gt; cap_inode_getsecurity()                                                  |                             |
                               {                                                                        V                             |
                                       10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0);                           |
                                       10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                           |
                                               /* Expected result is 0 and thus that we own the fscap. */                             |
                                              0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);                     |
                               }                                                                                                      |
                               -&gt; vfs_getxattr_alloc()                                                                                |
                                  -&gt; handler-&gt;get == ovl_other_xattr_get()                                                            |
                                    |-&gt; vfs_getxattr()                                                                                |
                                        -&gt; xattr_getsecurity()                                                                        |
                                           -&gt; security_inode_getsecurity()                                                            |
                                              -&gt; cap_inode_getsecurity()                                                              |
                                                 {                                                                                    |
                                                                 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);                      |
                                                          10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0);        |
                                                                 0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); |
                                                                 |____________________________________________________________________|
                                                 }
                                                 -&gt; vfs_getxattr_alloc()
                                                    -&gt; handler-&gt;get == /* lower filesystem callback */

We can see how the translation happens correctly in those cases as the
conversion happens within the vfs_getxattr() helper.

For POSIX ACLs we need to do something similar. However, in contrast to fscaps
we cannot apply the fix directly to the kernel internal posix acl data
structure as this would alter the cached values and would also require a rework
of how we currently deal with POSIX ACLs in general which almost never take the
filesystem idmapping into account (the noteable exception being FUSE but even
there the implementation is special) and instead retrieve the raw values based
on the initial idmapping.

The correct values are then generated right before returning to userspace. The
fix for this is to move taking the mount's idmapping into account directly in
vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user().

To this end we split out two small and unexported helpers
posix_acl_getxattr_idmapped_mnt() and posix_acl_setxattr_idmapped_mnt(). The
former to be called in vfs_getxattr() and the latter to be called in
vfs_setxattr().

Let's go back to the original example. Assume the user chose option (1) and
mounted overlayfs on top of idmapped mounts on the host:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  |&gt; __vfs_getxattr()
                  |  |  -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |  |     -&gt; ovl_xattr_get()
                  |  |        -&gt; vfs_getxattr()
                  |  |           |&gt; __vfs_getxattr()
                  |  |           |  -&gt; handler-&gt;get() /* lower filesystem callback */
                  |  |           |&gt; posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&amp;init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&amp;init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |&gt; posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           |
                  |                                                 V
                  |             10000004 = make_kuid(&amp;init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&amp;init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(&amp;init_user_ns, 10000004);
                  |     }       |_________________________________________________
                  |                                                              |
                  |                                                              |
                  |&gt; posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004);
                     }

And similarly if the user chooses option (1) and mounted overayfs on top of
idmapped mounts inside the container:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:10000000:65536

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  |&gt; __vfs_getxattr()
                  |  |  -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |  |     -&gt; ovl_xattr_get()
                  |  |        -&gt; vfs_getxattr()
                  |  |           |&gt; __vfs_getxattr()
                  |  |           |  -&gt; handler-&gt;get() /* lower filesystem callback */
                  |  |           |&gt; posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&amp;init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&amp;init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |&gt; posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           V
                  |             10000004 = make_kuid(&amp;init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&amp;init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(0(&amp;init_user_ns, 10000004);
                  |             |_________________________________________________
                  |     }                                                        |
                  |                                                              |
                  |&gt; posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004);
                     }

The last remaining problem we need to fix here is ovl_get_acl(). During
ovl_permission() overlayfs will call:

        ovl_permission()
        -&gt; generic_permission()
           -&gt; acl_permission_check()
              -&gt; check_acl()
                 -&gt; get_acl()
                    -&gt; inode-&gt;i_op-&gt;get_acl() == ovl_get_acl()
                        &gt; get_acl() /* on the underlying filesystem)
                          -&gt;inode-&gt;i_op-&gt;get_acl() == /*lower filesystem callback */
                 -&gt; posix_acl_permission()

passing through the get_acl request to the underlying filesystem. This will
retrieve the acls stored in the lower filesystem without taking the idmapping
of the underlying mount into account as this would mean altering the cached
values for the lower filesystem. So we block using ACLs for now until we
decided on a nice way to fix this. Note this limitation both in the
documentation and in the code.

The most straightforward solution would be to have ovl_get_acl() simply
duplicate the ACLs, update the values according to the idmapped mount and
return it to acl_permission_check() so it can be used in posix_acl_permission()
forgetting them afterwards. This is a bit heavy handed but fairly
straightforward otherwise.

Link: https://github.com/brauner/mount-idmapped/issues/9
Link: https://lore.kernel.org/r/20220708090134.385160-2-brauner@kernel.org
Cc: Seth Forshee &lt;sforshee@digitalocean.com&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Cc: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee &lt;sforshee@digitalocean.com&gt;
Signed-off-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xattr: handle idmapped mounts</title>
<updated>2021-01-24T13:27:17+00:00</updated>
<author>
<name>Tycho Andersen</name>
<email>tycho@tycho.pizza</email>
</author>
<published>2021-01-21T13:19:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c7c7a1a18af4c3bb7749d33e3df3acdf0a95bbb5'/>
<id>c7c7a1a18af4c3bb7749d33e3df3acdf0a95bbb5</id>
<content type='text'>
When interacting with extended attributes the vfs verifies that the
caller is privileged over the inode with which the extended attribute is
associated. For posix access and posix default extended attributes a uid
or gid can be stored on-disk. Let the functions handle posix extended
attributes on idmapped mounts. If the inode is accessed through an
idmapped mount we need to map it according to the mount's user
namespace. Afterwards the checks are identical to non-idmapped mounts.
This has no effect for e.g. security xattrs since they don't store uids
or gids and don't perform permission checks on them like posix acls do.

Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: James Morris &lt;jamorris@linux.microsoft.com&gt;
Signed-off-by: Tycho Andersen &lt;tycho@tycho.pizza&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When interacting with extended attributes the vfs verifies that the
caller is privileged over the inode with which the extended attribute is
associated. For posix access and posix default extended attributes a uid
or gid can be stored on-disk. Let the functions handle posix extended
attributes on idmapped mounts. If the inode is accessed through an
idmapped mount we need to map it according to the mount's user
namespace. Afterwards the checks are identical to non-idmapped mounts.
This has no effect for e.g. security xattrs since they don't store uids
or gids and don't perform permission checks on them like posix acls do.

Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: James Morris &lt;jamorris@linux.microsoft.com&gt;
Signed-off-by: Tycho Andersen &lt;tycho@tycho.pizza&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>acl: handle idmapped mounts</title>
<updated>2021-01-24T13:27:17+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2021-01-21T13:19:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e65ce2a50cf6af216bea6fd80d771fcbb4c0aaa1'/>
<id>e65ce2a50cf6af216bea6fd80d771fcbb4c0aaa1</id>
<content type='text'>
The posix acl permission checking helpers determine whether a caller is
privileged over an inode according to the acls associated with the
inode. Add helpers that make it possible to handle acls on idmapped
mounts.

The vfs and the filesystems targeted by this first iteration make use of
posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
translate basic posix access and default permissions such as the
ACL_USER and ACL_GROUP type according to the initial user namespace (or
the superblock's user namespace) to and from the caller's current user
namespace. Adapt these two helpers to handle idmapped mounts whereby we
either map from or into the mount's user namespace depending on in which
direction we're translating.
Similarly, cap_convert_nscap() is used by the vfs to translate user
namespace and non-user namespace aware filesystem capabilities from the
superblock's user namespace to the caller's user namespace. Enable it to
handle idmapped mounts by accounting for the mount's user namespace.

In addition the fileystems targeted in the first iteration of this patch
series make use of the posix_acl_chmod() and, posix_acl_update_mode()
helpers. Both helpers perform permission checks on the target inode. Let
them handle idmapped mounts. These two helpers are called when posix
acls are set by the respective filesystems to handle this case we extend
the -&gt;set() method to take an additional user namespace argument to pass
the mount's user namespace down.

Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The posix acl permission checking helpers determine whether a caller is
privileged over an inode according to the acls associated with the
inode. Add helpers that make it possible to handle acls on idmapped
mounts.

The vfs and the filesystems targeted by this first iteration make use of
posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
translate basic posix access and default permissions such as the
ACL_USER and ACL_GROUP type according to the initial user namespace (or
the superblock's user namespace) to and from the caller's current user
namespace. Adapt these two helpers to handle idmapped mounts whereby we
either map from or into the mount's user namespace depending on in which
direction we're translating.
Similarly, cap_convert_nscap() is used by the vfs to translate user
namespace and non-user namespace aware filesystem capabilities from the
superblock's user namespace to the caller's user namespace. Enable it to
handle idmapped mounts by accounting for the mount's user namespace.

In addition the fileystems targeted in the first iteration of this patch
series make use of the posix_acl_chmod() and, posix_acl_update_mode()
helpers. Both helpers perform permission checks on the target inode. Let
them handle idmapped mounts. These two helpers are called when posix
acls are set by the respective filesystems to handle this case we extend
the -&gt;set() method to take an additional user namespace argument to pass
the mount's user namespace down.

Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6</title>
<updated>2020-08-09T20:58:04+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-08-09T20:58:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7a6b60441f02f6e22e7c0936ef16fa3f51832a48'/>
<id>7a6b60441f02f6e22e7c0936ef16fa3f51832a48</id>
<content type='text'>
Pull NFS server updates from Chuck Lever:
 "Highlights:
   - Support for user extended attributes on NFS (RFC 8276)
   - Further reduce unnecessary NFSv4 delegation recalls

  Notable fixes:
   - Fix recent krb5p regression
   - Address a few resource leaks and a rare NULL dereference

  Other:
   - De-duplicate RPC/RDMA error handling and other utility functions
   - Replace storage and display of kernel memory addresses by tracepoints"

* tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits)
  svcrdma: CM event handler clean up
  svcrdma: Remove transport reference counting
  svcrdma: Fix another Receive buffer leak
  SUNRPC: Refresh the show_rqstp_flags() macro
  nfsd: netns.h: delete a duplicated word
  SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()")
  nfsd: avoid a NULL dereference in __cld_pipe_upcall()
  nfsd4: a client's own opens needn't prevent delegations
  nfsd: Use seq_putc() in two functions
  svcrdma: Display chunk completion ID when posting a rw_ctxt
  svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()
  svcrdma: Introduce Send completion IDs
  svcrdma: Record Receive completion ID in svc_rdma_decode_rqst
  svcrdma: Introduce Receive completion IDs
  svcrdma: Introduce infrastructure to support completion IDs
  svcrdma: Add common XDR encoders for RDMA and Read segments
  svcrdma: Add common XDR decoders for RDMA and Read segments
  SUNRPC: Add helpers for decoding list discriminators symbolically
  svcrdma: Remove declarations for functions long removed
  svcrdma: Clean up trace_svcrdma_send_failed() tracepoint
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull NFS server updates from Chuck Lever:
 "Highlights:
   - Support for user extended attributes on NFS (RFC 8276)
   - Further reduce unnecessary NFSv4 delegation recalls

  Notable fixes:
   - Fix recent krb5p regression
   - Address a few resource leaks and a rare NULL dereference

  Other:
   - De-duplicate RPC/RDMA error handling and other utility functions
   - Replace storage and display of kernel memory addresses by tracepoints"

* tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits)
  svcrdma: CM event handler clean up
  svcrdma: Remove transport reference counting
  svcrdma: Fix another Receive buffer leak
  SUNRPC: Refresh the show_rqstp_flags() macro
  nfsd: netns.h: delete a duplicated word
  SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()")
  nfsd: avoid a NULL dereference in __cld_pipe_upcall()
  nfsd4: a client's own opens needn't prevent delegations
  nfsd: Use seq_putc() in two functions
  svcrdma: Display chunk completion ID when posting a rw_ctxt
  svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()
  svcrdma: Introduce Send completion IDs
  svcrdma: Record Receive completion ID in svc_rdma_decode_rqst
  svcrdma: Introduce Receive completion IDs
  svcrdma: Introduce infrastructure to support completion IDs
  svcrdma: Add common XDR encoders for RDMA and Read segments
  svcrdma: Add common XDR decoders for RDMA and Read segments
  SUNRPC: Add helpers for decoding list discriminators symbolically
  svcrdma: Remove declarations for functions long removed
  svcrdma: Clean up trace_svcrdma_send_failed() tracepoint
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way</title>
<updated>2020-07-24T19:42:41+00:00</updated>
<author>
<name>Chengguang Xu</name>
<email>cgxu519@mykernel.net</email>
</author>
<published>2020-07-24T04:15:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3bef735ad7b7d987069181e7b58588043cbd1509'/>
<id>3bef735ad7b7d987069181e7b58588043cbd1509</id>
<content type='text'>
After commit fdc85222d58e ("kernfs: kvmalloc xattr value instead of
kmalloc"), simple xattr entry is allocated with kvmalloc() instead of
kmalloc(), so we should release it with kvfree() instead of kfree().

Fixes: fdc85222d58e ("kernfs: kvmalloc xattr value instead of kmalloc")
Signed-off-by: Chengguang Xu &lt;cgxu519@mykernel.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Daniel Xu &lt;dxu@dxuuu.xyz&gt;
Cc: Chris Down &lt;chris@chrisdown.name&gt;
Cc: Andreas Dilger &lt;adilger@dilger.ca&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[5.7]
Link: http://lkml.kernel.org/r/20200704051608.15043-1-cgxu519@mykernel.net
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit fdc85222d58e ("kernfs: kvmalloc xattr value instead of
kmalloc"), simple xattr entry is allocated with kvmalloc() instead of
kmalloc(), so we should release it with kvfree() instead of kfree().

Fixes: fdc85222d58e ("kernfs: kvmalloc xattr value instead of kmalloc")
Signed-off-by: Chengguang Xu &lt;cgxu519@mykernel.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Daniel Xu &lt;dxu@dxuuu.xyz&gt;
Cc: Chris Down &lt;chris@chrisdown.name&gt;
Cc: Andreas Dilger &lt;adilger@dilger.ca&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[5.7]
Link: http://lkml.kernel.org/r/20200704051608.15043-1-cgxu519@mykernel.net
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xattr: add a function to check if a namespace is supported</title>
<updated>2020-07-13T21:27:03+00:00</updated>
<author>
<name>Frank van der Linden</name>
<email>fllinden@amazon.com</email>
</author>
<published>2020-06-23T22:39:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cab8d289c5ad541a5351a651d95c4086b7f84d7c'/>
<id>cab8d289c5ad541a5351a651d95c4086b7f84d7c</id>
<content type='text'>
Add a function that checks is an extended attribute namespace is
supported for an inode, meaning that a handler must be present
for either the whole namespace, or at least one synthetic
xattr in the namespace.

To be used by the nfs server code when being queried for extended
attributes support.

Cc: linux-fsdevel@vger.kernel.org
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Frank van der Linden &lt;fllinden@amazon.com&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a function that checks is an extended attribute namespace is
supported for an inode, meaning that a handler must be present
for either the whole namespace, or at least one synthetic
xattr in the namespace.

To be used by the nfs server code when being queried for extended
attributes support.

Cc: linux-fsdevel@vger.kernel.org
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Frank van der Linden &lt;fllinden@amazon.com&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xattr: break delegations in {set,remove}xattr</title>
<updated>2020-07-13T21:27:03+00:00</updated>
<author>
<name>Frank van der Linden</name>
<email>fllinden@amazon.com</email>
</author>
<published>2020-06-23T22:39:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=08b5d5014a27e717826999ad20e394a8811aae92'/>
<id>08b5d5014a27e717826999ad20e394a8811aae92</id>
<content type='text'>
set/removexattr on an exported filesystem should break NFS delegations.
This is true in general, but also for the upcoming support for
RFC 8726 (NFSv4 extended attribute support). Make sure that they do.

Additionally, they need to grow a _locked variant, since callers might
call this with i_rwsem held (like the NFS server code).

Cc: stable@vger.kernel.org # v4.9+
Cc: linux-fsdevel@vger.kernel.org
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Frank van der Linden &lt;fllinden@amazon.com&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
set/removexattr on an exported filesystem should break NFS delegations.
This is true in general, but also for the upcoming support for
RFC 8726 (NFSv4 extended attribute support). Make sure that they do.

Additionally, they need to grow a _locked variant, since callers might
call this with i_rwsem held (like the NFS server code).

Cc: stable@vger.kernel.org # v4.9+
Cc: linux-fsdevel@vger.kernel.org
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Frank van der Linden &lt;fllinden@amazon.com&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xattr.h: Replace zero-length array with flexible-array member</title>
<updated>2020-04-18T20:44:56+00:00</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavo@embeddedor.com</email>
</author>
<published>2020-03-24T00:41:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=43951585e1308b322c8ee31a4aafd08213f5c5d7'/>
<id>43951585e1308b322c8ee31a4aafd08213f5c5d7</id>
<content type='text'>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva &lt;gustavo@embeddedor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva &lt;gustavo@embeddedor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernfs: Add removed_size out param for simple_xattr_set</title>
<updated>2020-03-16T19:53:47+00:00</updated>
<author>
<name>Daniel Xu</name>
<email>dxu@dxuuu.xyz</email>
</author>
<published>2020-03-12T20:03:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a46a22955bae16fc5a756af7188d3ccb25c3f797'/>
<id>a46a22955bae16fc5a756af7188d3ccb25c3f797</id>
<content type='text'>
This helps set up size accounting in the next commit. Without this out
param, it's difficult to find out the removed xattr size without taking
a lock for longer and walking the xattr linked list twice.

Signed-off-by: Daniel Xu &lt;dxu@dxuuu.xyz&gt;
Acked-by: Chris Down &lt;chris@chrisdown.name&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This helps set up size accounting in the next commit. Without this out
param, it's difficult to find out the removed xattr size without taking
a lock for longer and walking the xattr linked list twice.

Signed-off-by: Daniel Xu &lt;dxu@dxuuu.xyz&gt;
Acked-by: Chris Down &lt;chris@chrisdown.name&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>make xattr_getsecurity() static</title>
<updated>2018-05-14T13:51:34+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2018-04-25T01:22:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2220c5b0a7fb7d7d00c9a0b1e5222a7ae1f35956'/>
<id>2220c5b0a7fb7d7d00c9a0b1e5222a7ae1f35956</id>
<content type='text'>
many years overdue...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
many years overdue...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
