<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/Documentation/trace, branch v6.2-rc3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace</title>
<updated>2022-12-22T02:57:24+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-22T02:57:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=af9b3fa15d6d99d948bcaca5a036ad2b292c8e8a'/>
<id>af9b3fa15d6d99d948bcaca5a036ad2b292c8e8a</id>
<content type='text'>
Pull trace probes updates from Steven Rostedt:

 - New "symstr" type for dynamic events that writes the name of the
   function+offset into the ring buffer and not just the address

 - Prevent kernel symbol processing on addresses in user space probes
   (uprobes).

 - And minor fixes and clean ups

* tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Reject symbol/symstr type for uprobe
  tracing/probes: Add symstr type for dynamic events
  kprobes: kretprobe events missing on 2-core KVM guest
  kprobes: Fix check for probe enabled in kill_kprobe()
  test_kprobes: Fix implicit declaration error of test_kprobes
  tracing: Fix race where eprobes can be called before the event
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull trace probes updates from Steven Rostedt:

 - New "symstr" type for dynamic events that writes the name of the
   function+offset into the ring buffer and not just the address

 - Prevent kernel symbol processing on addresses in user space probes
   (uprobes).

 - And minor fixes and clean ups

* tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Reject symbol/symstr type for uprobe
  tracing/probes: Add symstr type for dynamic events
  kprobes: kretprobe events missing on 2-core KVM guest
  kprobes: Fix check for probe enabled in kill_kprobe()
  test_kprobes: Fix implicit declaration error of test_kprobes
  tracing: Fix race where eprobes can be called before the event
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace</title>
<updated>2022-12-16T02:01:16+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-16T02:01:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fe36bb8736ee9e38fa6173e1271ed8c5cf7bc907'/>
<id>fe36bb8736ee9e38fa6173e1271ed8c5cf7bc907</id>
<content type='text'>
Pull tracing updates from Steven Rostedt:

 - Add options to the osnoise tracer:
      - 'panic_on_stop' option that panics the kernel if osnoise is
        greater than some user defined threshold.
      - 'preempt' option, to test noise while preemption is disabled
      - 'irq' option, to test noise when interrupts are disabled

 - Add .percent and .graph suffix to histograms to give different
   outputs

 - Add nohitcount to disable showing hitcount in histogram output

 - Add new __cpumask() to trace event fields to annotate that a unsigned
   long array is a cpumask to user space and should be treated as one.

 - Add trace_trigger kernel command line parameter to enable trace event
   triggers at boot up. Useful to trace stack traces, disable tracing
   and take snapshots.

 - Fix x86/kmmio mmio tracer to work with the updates to lockdep

 - Unify the panic and die notifiers

 - Add back ftrace_expect reference that is used to extract more
   information in the ftrace_bug() code.

 - Have trigger filter parsing errors show up in the tracing error log.

 - Updated MAINTAINERS file to add kernel tracing mailing list and
   patchwork info

 - Use IDA to keep track of event type numbers.

 - And minor fixes and clean ups

* tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
  tracing: Fix cpumask() example typo
  tracing: Improve panic/die notifiers
  ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY kernels
  tracing: Do not synchronize freeing of trigger filter on boot up
  tracing: Remove pointer (asterisk) and brackets from cpumask_t field
  tracing: Have trigger filter parsing errors show up in error_log
  x86/mm/kmmio: Remove redundant preempt_disable()
  tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
  Documentation/osnoise: Add osnoise/options documentation
  tracing/osnoise: Add preempt and/or irq disabled options
  tracing/osnoise: Add PANIC_ON_STOP option
  Documentation/osnoise: Escape underscore of NO_ prefix
  tracing: Fix some checker warnings
  tracing/osnoise: Make osnoise_options static
  tracing: remove unnecessary trace_trigger ifdef
  ring-buffer: Handle resize in early boot up
  tracing/hist: Fix issue of losting command info in error_log
  tracing: Fix issue of missing one synthetic field
  tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
  tracing/hist: Fix wrong return value in parse_action_params()
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull tracing updates from Steven Rostedt:

 - Add options to the osnoise tracer:
      - 'panic_on_stop' option that panics the kernel if osnoise is
        greater than some user defined threshold.
      - 'preempt' option, to test noise while preemption is disabled
      - 'irq' option, to test noise when interrupts are disabled

 - Add .percent and .graph suffix to histograms to give different
   outputs

 - Add nohitcount to disable showing hitcount in histogram output

 - Add new __cpumask() to trace event fields to annotate that a unsigned
   long array is a cpumask to user space and should be treated as one.

 - Add trace_trigger kernel command line parameter to enable trace event
   triggers at boot up. Useful to trace stack traces, disable tracing
   and take snapshots.

 - Fix x86/kmmio mmio tracer to work with the updates to lockdep

 - Unify the panic and die notifiers

 - Add back ftrace_expect reference that is used to extract more
   information in the ftrace_bug() code.

 - Have trigger filter parsing errors show up in the tracing error log.

 - Updated MAINTAINERS file to add kernel tracing mailing list and
   patchwork info

 - Use IDA to keep track of event type numbers.

 - And minor fixes and clean ups

* tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
  tracing: Fix cpumask() example typo
  tracing: Improve panic/die notifiers
  ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY kernels
  tracing: Do not synchronize freeing of trigger filter on boot up
  tracing: Remove pointer (asterisk) and brackets from cpumask_t field
  tracing: Have trigger filter parsing errors show up in error_log
  x86/mm/kmmio: Remove redundant preempt_disable()
  tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
  Documentation/osnoise: Add osnoise/options documentation
  tracing/osnoise: Add preempt and/or irq disabled options
  tracing/osnoise: Add PANIC_ON_STOP option
  Documentation/osnoise: Escape underscore of NO_ prefix
  tracing: Fix some checker warnings
  tracing/osnoise: Make osnoise_options static
  tracing: remove unnecessary trace_trigger ifdef
  ring-buffer: Handle resize in early boot up
  tracing/hist: Fix issue of losting command info in error_log
  tracing: Fix issue of missing one synthetic field
  tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
  tracing/hist: Fix wrong return value in parse_action_params()
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing/probes: Add symstr type for dynamic events</title>
<updated>2022-12-14T23:48:57+00:00</updated>
<author>
<name>Masami Hiramatsu (Google)</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2022-11-14T04:47:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b26a124cbfa80f42bfc4e63e1d5643ca98159d66'/>
<id>b26a124cbfa80f42bfc4e63e1d5643ca98159d66</id>
<content type='text'>
Add 'symstr' type for storing the kernel symbol as a string data
instead of the symbol address. This allows us to filter the
events by wildcard symbol name.

e.g.
  # echo 'e:wqfunc workqueue.workqueue_execute_start symname=$function:symstr' &gt;&gt; dynamic_events
  # cat events/eprobes/wqfunc/format
  name: wqfunc
  ID: 2110
  format:
  	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
  	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
  	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
  	field:int common_pid;	offset:4;	size:4;	signed:1;

  	field:__data_loc char[] symname;	offset:8;	size:4;	signed:1;

  print fmt: " symname=\"%s\"", __get_str(symname)

Note that there is already 'symbol' type which just change the
print format (so it still stores the symbol address in the tracing
ring buffer.) On the other hand, 'symstr' type stores the actual
"symbol+offset/size" data as a string.

Link: https://lore.kernel.org/all/166679930847.1528100.4124308529180235965.stgit@devnote3/

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add 'symstr' type for storing the kernel symbol as a string data
instead of the symbol address. This allows us to filter the
events by wildcard symbol name.

e.g.
  # echo 'e:wqfunc workqueue.workqueue_execute_start symname=$function:symstr' &gt;&gt; dynamic_events
  # cat events/eprobes/wqfunc/format
  name: wqfunc
  ID: 2110
  format:
  	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
  	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
  	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
  	field:int common_pid;	offset:4;	size:4;	signed:1;

  	field:__data_loc char[] symname;	offset:8;	size:4;	signed:1;

  print fmt: " symname=\"%s\"", __get_str(symname)

Note that there is already 'symbol' type which just change the
print format (so it still stores the symbol address in the tracing
ring buffer.) On the other hand, 'symstr' type stores the actual
"symbol+offset/size" data as a string.

Link: https://lore.kernel.org/all/166679930847.1528100.4124308529180235965.stgit@devnote3/

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kprobes: kretprobe events missing on 2-core KVM guest</title>
<updated>2022-12-14T23:48:40+00:00</updated>
<author>
<name>wuqiang</name>
<email>wuqiang.matt@bytedance.com</email>
</author>
<published>2022-11-10T08:15:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3b7ddab8a19aefc768f345fd3782af35b4a68d9b'/>
<id>3b7ddab8a19aefc768f345fd3782af35b4a68d9b</id>
<content type='text'>
Default value of maxactive is set as num_possible_cpus() for nonpreemptable
systems. For a 2-core system, only 2 kretprobe instances would be allocated
in default, then these 2 instances for execve kretprobe are very likely to
be used up with a pipelined command.

Here's the testcase: a shell script was added to crontab, and the content
of the script is:

  #!/bin/sh
  do_something_magic `tr -dc a-z &lt; /dev/urandom | head -c 10`

cron will trigger a series of program executions (4 times every hour). Then
events loss would be noticed normally after 3-4 hours of testings.

The issue is caused by a burst of series of execve requests. The best number
of kretprobe instances could be different case by case, and should be user's
duty to determine, but num_possible_cpus() as the default value is inadequate
especially for systems with small number of cpus.

This patch enables the logic for preemption as default, thus increases the
minimum of maxactive to 10 for nonpreemptable systems.

Link: https://lore.kernel.org/all/20221110081502.492289-1-wuqiang.matt@bytedance.com/

Signed-off-by: wuqiang &lt;wuqiang.matt@bytedance.com&gt;
Reviewed-by: Solar Designer &lt;solar@openwall.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Default value of maxactive is set as num_possible_cpus() for nonpreemptable
systems. For a 2-core system, only 2 kretprobe instances would be allocated
in default, then these 2 instances for execve kretprobe are very likely to
be used up with a pipelined command.

Here's the testcase: a shell script was added to crontab, and the content
of the script is:

  #!/bin/sh
  do_something_magic `tr -dc a-z &lt; /dev/urandom | head -c 10`

cron will trigger a series of program executions (4 times every hour). Then
events loss would be noticed normally after 3-4 hours of testings.

The issue is caused by a burst of series of execve requests. The best number
of kretprobe instances could be different case by case, and should be user's
duty to determine, but num_possible_cpus() as the default value is inadequate
especially for systems with small number of cpus.

This patch enables the logic for preemption as default, thus increases the
minimum of maxactive to 10 for nonpreemptable systems.

Link: https://lore.kernel.org/all/20221110081502.492289-1-wuqiang.matt@bytedance.com/

Signed-off-by: wuqiang &lt;wuqiang.matt@bytedance.com&gt;
Reviewed-by: Solar Designer &lt;solar@openwall.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping</title>
<updated>2022-12-13T03:03:10+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-13T03:03:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cf619f891971bfac659ac64968f8c35db605c884'/>
<id>cf619f891971bfac659ac64968f8c35db605c884</id>
<content type='text'>
Pull setgid inheritance updates from Christian Brauner:
 "This contains the work to make setgid inheritance consistent between
  modifying a file and when changing ownership or mode as this has been
  a repeated source of very subtle bugs. The gist is that we perform the
  same permission checks in the write path as we do in the ownership and
  mode changing paths after this series where we're currently doing
  different things.

  We've already made setgid inheritance a lot more consistent and
  reliable in the last releases by moving setgid stripping from the
  individual filesystems up into the vfs. This aims to make the logic
  even more consistent and easier to understand and also to fix
  long-standing overlayfs setgid inheritance bugs. Miklos was nice
  enough to just let me carry the trivial overlayfs patches from Amir
  too.

  Below is a more detailed explanation how the current difference in
  setgid handling lead to very subtle bugs exemplified via overlayfs
  which is a victim of the current rules. I hope this explains why I
  think taking the regression risk here is worth it.

  A long while ago I found a few setgid inheritance bugs in overlayfs in
  the write path in certain conditions. Amir recently picked this back
  up in [1] and I jumped on board to fix this more generally.

  On the surface all that overlayfs would need to fix setgid inheritance
  would be to call file_remove_privs() or file_modified() but actually
  that isn't enough because the setgid inheritance api is wildly
  inconsistent in that area.

  Before this pr setgid stripping in file_remove_privs()'s old
  should_remove_suid() helper was inconsistent with other parts of the
  vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is
  S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups
  and the caller isn't privileged over the inode although we require
  this already in setattr_prepare() and setattr_copy() and so all
  filesystem implement this requirement implicitly because they have to
  use setattr_{prepare,copy}() anyway.

  But the inconsistency shows up in setgid stripping bugs for overlayfs
  in xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
  generic/687). For example, we test whether suid and setgid stripping
  works correctly when performing various write-like operations as an
  unprivileged user (fallocate, reflink, write, etc.):

      echo "Test 1 - qa_user, non-exec file $verb"
      setup_testfile
      chmod a+rws $junk_file
      commit_and_check "$qa_user" "$verb" 64k 64k

  The test basically creates a file with 6666 permissions. While the
  file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP
  set.

  On a regular filesystem like xfs what will happen is:

      sys_fallocate()
      -&gt; vfs_fallocate()
         -&gt; xfs_file_fallocate()
            -&gt; file_modified()
               -&gt; __file_remove_privs()
                  -&gt; dentry_needs_remove_privs()
                     -&gt; should_remove_suid()
                  -&gt; __remove_privs()
                     newattrs.ia_valid = ATTR_FORCE | kill;
                     -&gt; notify_change()
                        -&gt; setattr_copy()

  In should_remove_suid() we can see that ATTR_KILL_SUID is raised
  unconditionally because the file in the test has S_ISUID set.

  But we also see that ATTR_KILL_SGID won't be set because while the
  file is S_ISGID it is not S_IXGRP (see above) which is a condition for
  ATTR_KILL_SGID being raised.

  So by the time we call notify_change() we have attr-&gt;ia_valid set to
  ATTR_KILL_SUID | ATTR_FORCE.

  Now notify_change() sees that ATTR_KILL_SUID is set and does:

      ia_valid      = attr-&gt;ia_valid |= ATTR_MODE
      attr-&gt;ia_mode = (inode-&gt;i_mode &amp; ~S_ISUID);

  which means that when we call setattr_copy() later we will definitely
  update inode-&gt;i_mode. Note that attr-&gt;ia_mode still contains S_ISGID.

  Now we call into the filesystem's -&gt;setattr() inode operation which
  will end up calling setattr_copy(). Since ATTR_MODE is set we will
  hit:

      if (ia_valid &amp; ATTR_MODE) {
              umode_t mode = attr-&gt;ia_mode;
              vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
              if (!vfsgid_in_group_p(vfsgid) &amp;&amp;
                  !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                      mode &amp;= ~S_ISGID;
              inode-&gt;i_mode = mode;
      }

  and since the caller in the test is neither capable nor in the group
  of the inode the S_ISGID bit is stripped.

  But assume the file isn't suid then ATTR_KILL_SUID won't be raised
  which has the consequence that neither the setgid nor the suid bits
  are stripped even though it should be stripped because the inode isn't
  in the caller's groups and the caller isn't privileged over the inode.

  If overlayfs is in the mix things become a bit more complicated and
  the bug shows up more clearly.

  When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to
  file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be
  raised but because the check in notify_change() is questioning the
  ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped
  the S_ISGID bit isn't removed even though it should be stripped:

      sys_fallocate()
      -&gt; vfs_fallocate()
         -&gt; ovl_fallocate()
            -&gt; file_remove_privs()
               -&gt; dentry_needs_remove_privs()
                  -&gt; should_remove_suid()
               -&gt; __remove_privs()
                  newattrs.ia_valid = ATTR_FORCE | kill;
                  -&gt; notify_change()
                     -&gt; ovl_setattr()
                        /* TAKE ON MOUNTER'S CREDS */
                        -&gt; ovl_do_notify_change()
                           -&gt; notify_change()
                        /* GIVE UP MOUNTER'S CREDS */
           /* TAKE ON MOUNTER'S CREDS */
           -&gt; vfs_fallocate()
              -&gt; xfs_file_fallocate()
                 -&gt; file_modified()
                    -&gt; __file_remove_privs()
                       -&gt; dentry_needs_remove_privs()
                          -&gt; should_remove_suid()
                       -&gt; __remove_privs()
                          newattrs.ia_valid = attr_force | kill;
                          -&gt; notify_change()

  The fix for all of this is to make file_remove_privs()'s
  should_remove_suid() helper perform the same checks as we already
  require in setattr_prepare() and setattr_copy() and have
  notify_change() not pointlessly requiring S_IXGRP again. It doesn't
  make any sense in the first place because the caller must calculate
  the flags via should_remove_suid() anyway which would raise
  ATTR_KILL_SGID

  Note that some xfstests will now fail as these patches will cause the
  setgid bit to be lost in certain conditions for unprivileged users
  modifying a setgid file when they would've been kept otherwise. I
  think this risk is worth taking and I explained and mentioned this
  multiple times on the list [2].

  Enforcing the rules consistently across write operations and
  chmod/chown will lead to losing the setgid bit in cases were it
  might've been retained before.

  While I've mentioned this a few times but it's worth repeating just to
  make sure that this is understood. For the sake of maintainability,
  consistency, and security this is a risk worth taking.

  If we really see regressions for workloads the fix is to have special
  setgid handling in the write path again with different semantics from
  chmod/chown and possibly additional duct tape for overlayfs. I'll
  update the relevant xfstests with if you should decide to merge this
  second setgid cleanup.

  Before that people should be aware that there might be failures for
  fstests where unprivileged users modify a setgid file"

Link: https://lore.kernel.org/linux-fsdevel/20221003123040.900827-1-amir73il@gmail.com [1]
Link: https://lore.kernel.org/linux-fsdevel/20221122142010.zchf2jz2oymx55qi@wittgenstein [2]

* tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fs: use consistent setgid checks in is_sxid()
  ovl: remove privs in ovl_fallocate()
  ovl: remove privs in ovl_copyfile()
  attr: use consistent sgid stripping checks
  attr: add setattr_should_drop_sgid()
  fs: move should_remove_suid()
  attr: add in_group_or_capable()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull setgid inheritance updates from Christian Brauner:
 "This contains the work to make setgid inheritance consistent between
  modifying a file and when changing ownership or mode as this has been
  a repeated source of very subtle bugs. The gist is that we perform the
  same permission checks in the write path as we do in the ownership and
  mode changing paths after this series where we're currently doing
  different things.

  We've already made setgid inheritance a lot more consistent and
  reliable in the last releases by moving setgid stripping from the
  individual filesystems up into the vfs. This aims to make the logic
  even more consistent and easier to understand and also to fix
  long-standing overlayfs setgid inheritance bugs. Miklos was nice
  enough to just let me carry the trivial overlayfs patches from Amir
  too.

  Below is a more detailed explanation how the current difference in
  setgid handling lead to very subtle bugs exemplified via overlayfs
  which is a victim of the current rules. I hope this explains why I
  think taking the regression risk here is worth it.

  A long while ago I found a few setgid inheritance bugs in overlayfs in
  the write path in certain conditions. Amir recently picked this back
  up in [1] and I jumped on board to fix this more generally.

  On the surface all that overlayfs would need to fix setgid inheritance
  would be to call file_remove_privs() or file_modified() but actually
  that isn't enough because the setgid inheritance api is wildly
  inconsistent in that area.

  Before this pr setgid stripping in file_remove_privs()'s old
  should_remove_suid() helper was inconsistent with other parts of the
  vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is
  S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups
  and the caller isn't privileged over the inode although we require
  this already in setattr_prepare() and setattr_copy() and so all
  filesystem implement this requirement implicitly because they have to
  use setattr_{prepare,copy}() anyway.

  But the inconsistency shows up in setgid stripping bugs for overlayfs
  in xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
  generic/687). For example, we test whether suid and setgid stripping
  works correctly when performing various write-like operations as an
  unprivileged user (fallocate, reflink, write, etc.):

      echo "Test 1 - qa_user, non-exec file $verb"
      setup_testfile
      chmod a+rws $junk_file
      commit_and_check "$qa_user" "$verb" 64k 64k

  The test basically creates a file with 6666 permissions. While the
  file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP
  set.

  On a regular filesystem like xfs what will happen is:

      sys_fallocate()
      -&gt; vfs_fallocate()
         -&gt; xfs_file_fallocate()
            -&gt; file_modified()
               -&gt; __file_remove_privs()
                  -&gt; dentry_needs_remove_privs()
                     -&gt; should_remove_suid()
                  -&gt; __remove_privs()
                     newattrs.ia_valid = ATTR_FORCE | kill;
                     -&gt; notify_change()
                        -&gt; setattr_copy()

  In should_remove_suid() we can see that ATTR_KILL_SUID is raised
  unconditionally because the file in the test has S_ISUID set.

  But we also see that ATTR_KILL_SGID won't be set because while the
  file is S_ISGID it is not S_IXGRP (see above) which is a condition for
  ATTR_KILL_SGID being raised.

  So by the time we call notify_change() we have attr-&gt;ia_valid set to
  ATTR_KILL_SUID | ATTR_FORCE.

  Now notify_change() sees that ATTR_KILL_SUID is set and does:

      ia_valid      = attr-&gt;ia_valid |= ATTR_MODE
      attr-&gt;ia_mode = (inode-&gt;i_mode &amp; ~S_ISUID);

  which means that when we call setattr_copy() later we will definitely
  update inode-&gt;i_mode. Note that attr-&gt;ia_mode still contains S_ISGID.

  Now we call into the filesystem's -&gt;setattr() inode operation which
  will end up calling setattr_copy(). Since ATTR_MODE is set we will
  hit:

      if (ia_valid &amp; ATTR_MODE) {
              umode_t mode = attr-&gt;ia_mode;
              vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
              if (!vfsgid_in_group_p(vfsgid) &amp;&amp;
                  !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                      mode &amp;= ~S_ISGID;
              inode-&gt;i_mode = mode;
      }

  and since the caller in the test is neither capable nor in the group
  of the inode the S_ISGID bit is stripped.

  But assume the file isn't suid then ATTR_KILL_SUID won't be raised
  which has the consequence that neither the setgid nor the suid bits
  are stripped even though it should be stripped because the inode isn't
  in the caller's groups and the caller isn't privileged over the inode.

  If overlayfs is in the mix things become a bit more complicated and
  the bug shows up more clearly.

  When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to
  file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be
  raised but because the check in notify_change() is questioning the
  ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped
  the S_ISGID bit isn't removed even though it should be stripped:

      sys_fallocate()
      -&gt; vfs_fallocate()
         -&gt; ovl_fallocate()
            -&gt; file_remove_privs()
               -&gt; dentry_needs_remove_privs()
                  -&gt; should_remove_suid()
               -&gt; __remove_privs()
                  newattrs.ia_valid = ATTR_FORCE | kill;
                  -&gt; notify_change()
                     -&gt; ovl_setattr()
                        /* TAKE ON MOUNTER'S CREDS */
                        -&gt; ovl_do_notify_change()
                           -&gt; notify_change()
                        /* GIVE UP MOUNTER'S CREDS */
           /* TAKE ON MOUNTER'S CREDS */
           -&gt; vfs_fallocate()
              -&gt; xfs_file_fallocate()
                 -&gt; file_modified()
                    -&gt; __file_remove_privs()
                       -&gt; dentry_needs_remove_privs()
                          -&gt; should_remove_suid()
                       -&gt; __remove_privs()
                          newattrs.ia_valid = attr_force | kill;
                          -&gt; notify_change()

  The fix for all of this is to make file_remove_privs()'s
  should_remove_suid() helper perform the same checks as we already
  require in setattr_prepare() and setattr_copy() and have
  notify_change() not pointlessly requiring S_IXGRP again. It doesn't
  make any sense in the first place because the caller must calculate
  the flags via should_remove_suid() anyway which would raise
  ATTR_KILL_SGID

  Note that some xfstests will now fail as these patches will cause the
  setgid bit to be lost in certain conditions for unprivileged users
  modifying a setgid file when they would've been kept otherwise. I
  think this risk is worth taking and I explained and mentioned this
  multiple times on the list [2].

  Enforcing the rules consistently across write operations and
  chmod/chown will lead to losing the setgid bit in cases were it
  might've been retained before.

  While I've mentioned this a few times but it's worth repeating just to
  make sure that this is understood. For the sake of maintainability,
  consistency, and security this is a risk worth taking.

  If we really see regressions for workloads the fix is to have special
  setgid handling in the write path again with different semantics from
  chmod/chown and possibly additional duct tape for overlayfs. I'll
  update the relevant xfstests with if you should decide to merge this
  second setgid cleanup.

  Before that people should be aware that there might be failures for
  fstests where unprivileged users modify a setgid file"

Link: https://lore.kernel.org/linux-fsdevel/20221003123040.900827-1-amir73il@gmail.com [1]
Link: https://lore.kernel.org/linux-fsdevel/20221122142010.zchf2jz2oymx55qi@wittgenstein [2]

* tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fs: use consistent setgid checks in is_sxid()
  ovl: remove privs in ovl_fallocate()
  ovl: remove privs in ovl_copyfile()
  attr: use consistent sgid stripping checks
  attr: add setattr_should_drop_sgid()
  fs: move should_remove_suid()
  attr: add in_group_or_capable()
</pre>
</div>
</content>
</entry>
<entry>
<title>Documentation/osnoise: Add osnoise/options documentation</title>
<updated>2022-12-10T18:36:05+00:00</updated>
<author>
<name>Daniel Bristot de Oliveira</name>
<email>bristot@kernel.org</email>
</author>
<published>2022-12-09T22:05:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d358dfe60b7724ad0acb8cf8375a608b983e2b59'/>
<id>d358dfe60b7724ad0acb8cf8375a608b983e2b59</id>
<content type='text'>
Add the documentation about the osnoise/options file, the options,
and some additional explanation about the OSNOISE_WORKLOAD option.

Link: https://lkml.kernel.org/r/fde5567a4bae364f67fd1e9a644d1d62862618a6.1670623111.git.bristot@kernel.org

Cc: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Clark Williams &lt;williams@redhat.com&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Signed-off-by: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Reviewed-by: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the documentation about the osnoise/options file, the options,
and some additional explanation about the OSNOISE_WORKLOAD option.

Link: https://lkml.kernel.org/r/fde5567a4bae364f67fd1e9a644d1d62862618a6.1670623111.git.bristot@kernel.org

Cc: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Clark Williams &lt;williams@redhat.com&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Signed-off-by: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Reviewed-by: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Documentation/osnoise: Escape underscore of NO_ prefix</title>
<updated>2022-12-10T18:36:05+00:00</updated>
<author>
<name>Bagas Sanjaya</name>
<email>bagasdotme@gmail.com</email>
</author>
<published>2022-11-25T03:43:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0e162c6f1c8f7f502b3978f5a37ed6fa2e4d21f9'/>
<id>0e162c6f1c8f7f502b3978f5a37ed6fa2e4d21f9</id>
<content type='text'>
kernel test robot reported unknown target name warning:

Documentation/trace/osnoise-tracer.rst:112: WARNING: Unknown target name: "no".

The warning causes NO_ prefix to be rendered as link text instead, which
points to non-existent link target.

Escape the prefix underscore to fix the warning.

Link: https://lkml.kernel.org/r/20221125034300.24168-1-bagasdotme@gmail.com

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Ammar Faizi &lt;ammarfaizi2@gnuweeb.org&gt;
Cc: GNU/Weeb Mailing List &lt;gwml@vger.gnuweeb.org&gt;
Link: https://lore.kernel.org/linux-doc/202211240447.HxRNftE5-lkp@intel.com/
Fixes: 67543cd6b8eee5 ("Documentation/osnoise: Add osnoise/options documentation")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Acked-by: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
kernel test robot reported unknown target name warning:

Documentation/trace/osnoise-tracer.rst:112: WARNING: Unknown target name: "no".

The warning causes NO_ prefix to be rendered as link text instead, which
points to non-existent link target.

Escape the prefix underscore to fix the warning.

Link: https://lkml.kernel.org/r/20221125034300.24168-1-bagasdotme@gmail.com

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Ammar Faizi &lt;ammarfaizi2@gnuweeb.org&gt;
Cc: GNU/Weeb Mailing List &lt;gwml@vger.gnuweeb.org&gt;
Link: https://lore.kernel.org/linux-doc/202211240447.HxRNftE5-lkp@intel.com/
Fixes: 67543cd6b8eee5 ("Documentation/osnoise: Add osnoise/options documentation")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Acked-by: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: docs: Update histogram doc for .percent/.graph and 'nohitcount'</title>
<updated>2022-12-10T04:48:05+00:00</updated>
<author>
<name>Masami Hiramatsu (Google)</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2022-10-19T15:31:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8c2b99790196d34a58a2b00a4c9862d2de3af3e2'/>
<id>8c2b99790196d34a58a2b00a4c9862d2de3af3e2</id>
<content type='text'>
Update histogram document for .percent/.graph suffixes and 'nohitcount'
option.

Link: https://lore.kernel.org/linux-trace-kernel/166610815604.56030.4124933216911828519.stgit@devnote2

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Reviewed-by: Tom Zanussi &lt;zanussi@kernel.org&gt;
Tested-by: Tom Zanussi &lt;zanussi@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Update histogram document for .percent/.graph suffixes and 'nohitcount'
option.

Link: https://lore.kernel.org/linux-trace-kernel/166610815604.56030.4124933216911828519.stgit@devnote2

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Reviewed-by: Tom Zanussi &lt;zanussi@kernel.org&gt;
Tested-by: Tom Zanussi &lt;zanussi@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Documentation/osnoise: Add osnoise/options documentation</title>
<updated>2022-11-24T00:08:31+00:00</updated>
<author>
<name>Daniel Bristot de Oliveira</name>
<email>bristot@kernel.org</email>
</author>
<published>2022-11-17T13:46:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=67543cd6b8eee53959e624b9ce420ca4d47be0c8'/>
<id>67543cd6b8eee53959e624b9ce420ca4d47be0c8</id>
<content type='text'>
Add the documentation about the osnoise/options file, along
with an explanation about the OSNOISE_WORKLOAD option.

Link: https://lkml.kernel.org/r/777af8f3d87beedd304805f98eff6c8291d64226.1668692096.git.bristot@kernel.org

Cc: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Signed-off-by: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the documentation about the osnoise/options file, along
with an explanation about the OSNOISE_WORKLOAD option.

Link: https://lkml.kernel.org/r/777af8f3d87beedd304805f98eff6c8291d64226.1668692096.git.bristot@kernel.org

Cc: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Signed-off-by: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing/histogram: Update document for KEYS_MAX size</title>
<updated>2022-10-18T21:41:56+00:00</updated>
<author>
<name>Zheng Yejian</name>
<email>zhengyejian1@huawei.com</email>
</author>
<published>2022-10-17T10:38:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a635beeacc6d56d2b71c39e6c0103f85b53d108e'/>
<id>a635beeacc6d56d2b71c39e6c0103f85b53d108e</id>
<content type='text'>
After commit 4f36c2d85ced ("tracing: Increase tracing map KEYS_MAX size"),
'keys' supports up to three fields.

Signed-off-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Link: https://lore.kernel.org/r/20221017103806.2479139-1-zhengyejian1@huawei.com
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 4f36c2d85ced ("tracing: Increase tracing map KEYS_MAX size"),
'keys' supports up to three fields.

Signed-off-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Link: https://lore.kernel.org/r/20221017103806.2479139-1-zhengyejian1@huawei.com
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
