<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/kprobes.c, branch v6.7-rc2</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>kprobes: kretprobe scalability improvement</title>
<updated>2023-10-18T14:59:54+00:00</updated>
<author>
<name>wuqiang.matt</name>
<email>wuqiang.matt@bytedance.com</email>
</author>
<published>2023-10-17T13:56:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4bbd9345565933823f38a419df65661f12adbe5e'/>
<id>4bbd9345565933823f38a419df65661f12adbe5e</id>
<content type='text'>
kretprobe is using freelist to manage return-instances, but freelist,
as LIFO queue based on singly linked list, scales badly and reduces
the overall throughput of kretprobed routines, especially for high
contention scenarios.

Here's a typical throughput test of sys_prctl (counts in 10 seconds,
measured with perf stat -a -I 10000 -e syscalls:sys_enter_prctl):

OS: Debian 10 X86_64, Linux 6.5rc7 with freelist
HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s

         1T       2T       4T       8T      16T      24T
   24150045 29317964 15446741 12494489 18287272 17708768
        32T      48T      64T      72T      96T     128T
   16200682 13737658 11645677 11269858 10470118  9931051

This patch introduces objpool to replace freelist. objpool is a
high performance queue, which can bring near-linear scalability
to kretprobed routines. Tests of kretprobe throughput show the
biggest ratio as 159x of original freelist. Here's the result:

                  1T         2T         4T         8T        16T
native:     41186213   82336866  164250978  328662645  658810299
freelist:   24150045   29317964   15446741   12494489   18287272
objpool:    23926730   48010314   96125218  191782984  385091769
                 32T        48T        64T        96T       128T
native:   1330338351 1969957941 2512291791 2615754135 2671040914
freelist:   16200682   13737658   11645677   10470118    9931051
objpool:   764481096 1147149781 1456220214 1502109662 1579015050

Testings on 96-core ARM64 output similarly, but with the biggest
ratio up to 448x:

OS: Debian 10 AARCH64, Linux 6.5rc7
HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s

                  1T         2T         4T         8T        16T
native: .   30066096   63569843  126194076  257447289  505800181
freelist:   16152090   11064397   11124068    7215768    5663013
objpool:    13997541   28032100   55726624  110099926  221498787
                 24T        32T        48T        64T        96T
native:    763305277 1015925192 1521075123 2033009392 3021013752
freelist:    5015810    4602893    3766792    3382478    2945292
objpool:   328192025  439439564  668534502  887401381 1319972072

Link: https://lore.kernel.org/all/20231017135654.82270-4-wuqiang.matt@bytedance.com/

Signed-off-by: wuqiang.matt &lt;wuqiang.matt@bytedance.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
kretprobe is using freelist to manage return-instances, but freelist,
as LIFO queue based on singly linked list, scales badly and reduces
the overall throughput of kretprobed routines, especially for high
contention scenarios.

Here's a typical throughput test of sys_prctl (counts in 10 seconds,
measured with perf stat -a -I 10000 -e syscalls:sys_enter_prctl):

OS: Debian 10 X86_64, Linux 6.5rc7 with freelist
HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s

         1T       2T       4T       8T      16T      24T
   24150045 29317964 15446741 12494489 18287272 17708768
        32T      48T      64T      72T      96T     128T
   16200682 13737658 11645677 11269858 10470118  9931051

This patch introduces objpool to replace freelist. objpool is a
high performance queue, which can bring near-linear scalability
to kretprobed routines. Tests of kretprobe throughput show the
biggest ratio as 159x of original freelist. Here's the result:

                  1T         2T         4T         8T        16T
native:     41186213   82336866  164250978  328662645  658810299
freelist:   24150045   29317964   15446741   12494489   18287272
objpool:    23926730   48010314   96125218  191782984  385091769
                 32T        48T        64T        96T       128T
native:   1330338351 1969957941 2512291791 2615754135 2671040914
freelist:   16200682   13737658   11645677   10470118    9931051
objpool:   764481096 1147149781 1456220214 1502109662 1579015050

Testings on 96-core ARM64 output similarly, but with the biggest
ratio up to 448x:

OS: Debian 10 AARCH64, Linux 6.5rc7
HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s

                  1T         2T         4T         8T        16T
native: .   30066096   63569843  126194076  257447289  505800181
freelist:   16152090   11064397   11124068    7215768    5663013
objpool:    13997541   28032100   55726624  110099926  221498787
                 24T        32T        48T        64T        96T
native:    763305277 1015925192 1521075123 2033009392 3021013752
freelist:    5015810    4602893    3766792    3382478    2945292
objpool:   328192025  439439564  668534502  887401381 1319972072

Link: https://lore.kernel.org/all/20231017135654.82270-4-wuqiang.matt@bytedance.com/

Signed-off-by: wuqiang.matt &lt;wuqiang.matt@bytedance.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel: kprobes: Use struct_size()</title>
<updated>2023-08-23T00:38:17+00:00</updated>
<author>
<name>Ruan Jinjie</name>
<email>ruanjinjie@huawei.com</email>
</author>
<published>2023-07-25T19:54:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8865aea0471c512c2d94220c60a0083cefcb9348'/>
<id>8865aea0471c512c2d94220c60a0083cefcb9348</id>
<content type='text'>
Use struct_size() instead of hand-writing it, when allocating a structure
with a flex array.

This is less verbose.

Link: https://lore.kernel.org/all/20230725195424.3469242-1-ruanjinjie@huawei.com/

Signed-off-by: Ruan Jinjie &lt;ruanjinjie@huawei.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use struct_size() instead of hand-writing it, when allocating a structure
with a flex array.

This is less verbose.

Link: https://lore.kernel.org/all/20230725195424.3469242-1-ruanjinjie@huawei.com/

Signed-off-by: Ruan Jinjie &lt;ruanjinjie@huawei.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kprobes: Prohibit probing on CFI preamble symbol</title>
<updated>2023-07-29T14:32:26+00:00</updated>
<author>
<name>Masami Hiramatsu (Google)</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2023-07-11T01:50:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=de02f2ac5d8cfb311f44f2bf144cc20002f1fbbd'/>
<id>de02f2ac5d8cfb311f44f2bf144cc20002f1fbbd</id>
<content type='text'>
Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those
are used for CFI and not executed. Probing it will break the CFI.

Link: https://lore.kernel.org/all/168904024679.116016.18089228029322008512.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Reviewed-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those
are used for CFI and not executed. Probing it will break the CFI.

Link: https://lore.kernel.org/all/168904024679.116016.18089228029322008512.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Reviewed-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'probes-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace</title>
<updated>2023-07-12T19:01:16+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-07-12T19:01:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9a3236ce48406c3190dfa06137636525001b32f5'/>
<id>9a3236ce48406c3190dfa06137636525001b32f5</id>
<content type='text'>
Pull probes fixes from Masami Hiramatsu:

 - Fix fprobe's rethook release issues:

     - Release rethook after ftrace_ops is unregistered so that the
       rethook is not accessed after free.

     - Stop rethook before ftrace_ops is unregistered so that the
       rethook is NOT used after exiting unregister_fprobe()

 - Fix eprobe cleanup logic. If it attaches to multiple events and
   failes to enable one of them, rollback all enabled events correctly.

 - Fix fprobe to unlock ftrace recursion lock correctly when it missed
   by another running kprobe.

 - Cleanup kprobe to remove unnecessary NULL.

 - Cleanup kprobe to remove unnecessary 0 initializations.

* tag 'probes-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()
  kernel: kprobes: Remove unnecessary ‘0’ values
  kprobes: Remove unnecessary ‘NULL’ values from correct_ret_addr
  fprobe: add unlock to match a succeeded ftrace_test_recursion_trylock
  kernel/trace: Fix cleanup logic of enable_trace_eprobe
  fprobe: Release rethook after the ftrace_ops is unregistered
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull probes fixes from Masami Hiramatsu:

 - Fix fprobe's rethook release issues:

     - Release rethook after ftrace_ops is unregistered so that the
       rethook is not accessed after free.

     - Stop rethook before ftrace_ops is unregistered so that the
       rethook is NOT used after exiting unregister_fprobe()

 - Fix eprobe cleanup logic. If it attaches to multiple events and
   failes to enable one of them, rollback all enabled events correctly.

 - Fix fprobe to unlock ftrace recursion lock correctly when it missed
   by another running kprobe.

 - Cleanup kprobe to remove unnecessary NULL.

 - Cleanup kprobe to remove unnecessary 0 initializations.

* tag 'probes-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()
  kernel: kprobes: Remove unnecessary ‘0’ values
  kprobes: Remove unnecessary ‘NULL’ values from correct_ret_addr
  fprobe: add unlock to match a succeeded ftrace_test_recursion_trylock
  kernel/trace: Fix cleanup logic of enable_trace_eprobe
  fprobe: Release rethook after the ftrace_ops is unregistered
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel: kprobes: Remove unnecessary ‘0’ values</title>
<updated>2023-07-10T15:50:51+00:00</updated>
<author>
<name>Li zeming</name>
<email>zeming@nfschina.com</email>
</author>
<published>2023-07-11T18:53:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ed9492dfef8738ca68879f5690dda5a04f1897dc'/>
<id>ed9492dfef8738ca68879f5690dda5a04f1897dc</id>
<content type='text'>
it is assigned first, so it does not need to initialize the assignment.

Link: https://lore.kernel.org/all/20230711185353.3218-1-zeming@nfschina.com/

Signed-off-by: Li zeming &lt;zeming@nfschina.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
it is assigned first, so it does not need to initialize the assignment.

Link: https://lore.kernel.org/all/20230711185353.3218-1-zeming@nfschina.com/

Signed-off-by: Li zeming &lt;zeming@nfschina.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kprobes: Remove unnecessary ‘NULL’ values from correct_ret_addr</title>
<updated>2023-07-10T15:50:35+00:00</updated>
<author>
<name>Li zeming</name>
<email>zeming@nfschina.com</email>
</author>
<published>2023-07-04T19:43:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e1164787f22b51010cfdc6a5e41b744435836b79'/>
<id>e1164787f22b51010cfdc6a5e41b744435836b79</id>
<content type='text'>
The 'correct_ret_addr' pointer is always set in the later code, no need
to initialize it at definition time.

Link: https://lore.kernel.org/all/20230704194359.3124-1-zeming@nfschina.com/

Signed-off-by: Li zeming &lt;zeming@nfschina.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The 'correct_ret_addr' pointer is always set in the later code, no need
to initialize it at definition time.

Link: https://lore.kernel.org/all/20230704194359.3124-1-zeming@nfschina.com/

Signed-off-by: Li zeming &lt;zeming@nfschina.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fprobe: Pass return address to the handlers</title>
<updated>2023-06-06T12:39:55+00:00</updated>
<author>
<name>Masami Hiramatsu (Google)</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2023-06-06T12:39:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cb16330d12741f6dae56aad5acf62f5be3a06c4e'/>
<id>cb16330d12741f6dae56aad5acf62f5be3a06c4e</id>
<content type='text'>
Pass return address as 'ret_ip' to the fprobe entry and return handlers
so that the fprobe user handler can get the reutrn address without
analyzing arch-dependent pt_regs.

Link: https://lore.kernel.org/all/168507467664.913472.11642316698862778600.stgit@mhiramat.roam.corp.google.com/

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pass return address as 'ret_ip' to the fprobe entry and return handlers
so that the fprobe user handler can get the reutrn address without
analyzing arch-dependent pt_regs.

Link: https://lore.kernel.org/all/168507467664.913472.11642316698862778600.stgit@mhiramat.roam.corp.google.com/

Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range</title>
<updated>2023-02-20T23:49:16+00:00</updated>
<author>
<name>Yang Jihong</name>
<email>yangjihong1@huawei.com</email>
</author>
<published>2023-02-20T23:49:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f1c97a1b4ef709e3f066f82e3ba3108c3b133ae6'/>
<id>f1c97a1b4ef709e3f066f82e3ba3108c3b133ae6</id>
<content type='text'>
When arch_prepare_optimized_kprobe calculating jump destination address,
it copies original instructions from jmp-optimized kprobe (see
__recover_optprobed_insn), and calculated based on length of original
instruction.

arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when
checking whether jmp-optimized kprobe exists.
As a result, setup_detour_execution may jump to a range that has been
overwritten by jump destination address, resulting in an inval opcode error.

For example, assume that register two kprobes whose addresses are
&lt;func+9&gt; and &lt;func+11&gt; in "func" function.
The original code of "func" function is as follows:

   0xffffffff816cb5e9 &lt;+9&gt;:     push   %r12
   0xffffffff816cb5eb &lt;+11&gt;:    xor    %r12d,%r12d
   0xffffffff816cb5ee &lt;+14&gt;:    test   %rdi,%rdi
   0xffffffff816cb5f1 &lt;+17&gt;:    setne  %r12b
   0xffffffff816cb5f5 &lt;+21&gt;:    push   %rbp

1.Register the kprobe for &lt;func+11&gt;, assume that is kp1, corresponding optimized_kprobe is op1.
  After the optimization, "func" code changes to:

   0xffffffff816cc079 &lt;+9&gt;:     push   %r12
   0xffffffff816cc07b &lt;+11&gt;:    jmp    0xffffffffa0210000
   0xffffffff816cc080 &lt;+16&gt;:    incl   0xf(%rcx)
   0xffffffff816cc083 &lt;+19&gt;:    xchg   %eax,%ebp
   0xffffffff816cc084 &lt;+20&gt;:    (bad)
   0xffffffff816cc085 &lt;+21&gt;:    push   %rbp

Now op1-&gt;flags == KPROBE_FLAG_OPTIMATED;

2. Register the kprobe for &lt;func+9&gt;, assume that is kp2, corresponding optimized_kprobe is op2.

register_kprobe(kp2)
  register_aggr_kprobe
    alloc_aggr_kprobe
      __prepare_optimized_kprobe
        arch_prepare_optimized_kprobe
          __recover_optprobed_insn    // copy original bytes from kp1-&gt;optinsn.copied_insn,
                                      // jump address = &lt;func+14&gt;

3. disable kp1:

disable_kprobe(kp1)
  __disable_kprobe
    ...
    if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
      ret = disarm_kprobe(orig_p, true)       // add op1 in unoptimizing_list, not unoptimized
      orig_p-&gt;flags |= KPROBE_FLAG_DISABLED;  // op1-&gt;flags ==  KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED
    ...

4. unregister kp2
__unregister_kprobe_top
  ...
  if (!kprobe_disabled(ap) &amp;&amp; !kprobes_all_disarmed) {
    optimize_kprobe(op)
      ...
      if (arch_check_optimized_kprobe(op) &lt; 0) // because op1 has KPROBE_FLAG_DISABLED, here not return
        return;
      p-&gt;kp.flags |= KPROBE_FLAG_OPTIMIZED;   //  now op2 has KPROBE_FLAG_OPTIMIZED
  }

"func" code now is:

   0xffffffff816cc079 &lt;+9&gt;:     int3
   0xffffffff816cc07a &lt;+10&gt;:    push   %rsp
   0xffffffff816cc07b &lt;+11&gt;:    jmp    0xffffffffa0210000
   0xffffffff816cc080 &lt;+16&gt;:    incl   0xf(%rcx)
   0xffffffff816cc083 &lt;+19&gt;:    xchg   %eax,%ebp
   0xffffffff816cc084 &lt;+20&gt;:    (bad)
   0xffffffff816cc085 &lt;+21&gt;:    push   %rbp

5. if call "func", int3 handler call setup_detour_execution:

  if (p-&gt;flags &amp; KPROBE_FLAG_OPTIMIZED) {
    ...
    regs-&gt;ip = (unsigned long)op-&gt;optinsn.insn + TMPL_END_IDX;
    ...
  }

The code for the destination address is

   0xffffffffa021072c:  push   %r12
   0xffffffffa021072e:  xor    %r12d,%r12d
   0xffffffffa0210731:  jmp    0xffffffff816cb5ee &lt;func+14&gt;

However, &lt;func+14&gt; is not a valid start instruction address. As a result, an error occurs.

Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/

Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When arch_prepare_optimized_kprobe calculating jump destination address,
it copies original instructions from jmp-optimized kprobe (see
__recover_optprobed_insn), and calculated based on length of original
instruction.

arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when
checking whether jmp-optimized kprobe exists.
As a result, setup_detour_execution may jump to a range that has been
overwritten by jump destination address, resulting in an inval opcode error.

For example, assume that register two kprobes whose addresses are
&lt;func+9&gt; and &lt;func+11&gt; in "func" function.
The original code of "func" function is as follows:

   0xffffffff816cb5e9 &lt;+9&gt;:     push   %r12
   0xffffffff816cb5eb &lt;+11&gt;:    xor    %r12d,%r12d
   0xffffffff816cb5ee &lt;+14&gt;:    test   %rdi,%rdi
   0xffffffff816cb5f1 &lt;+17&gt;:    setne  %r12b
   0xffffffff816cb5f5 &lt;+21&gt;:    push   %rbp

1.Register the kprobe for &lt;func+11&gt;, assume that is kp1, corresponding optimized_kprobe is op1.
  After the optimization, "func" code changes to:

   0xffffffff816cc079 &lt;+9&gt;:     push   %r12
   0xffffffff816cc07b &lt;+11&gt;:    jmp    0xffffffffa0210000
   0xffffffff816cc080 &lt;+16&gt;:    incl   0xf(%rcx)
   0xffffffff816cc083 &lt;+19&gt;:    xchg   %eax,%ebp
   0xffffffff816cc084 &lt;+20&gt;:    (bad)
   0xffffffff816cc085 &lt;+21&gt;:    push   %rbp

Now op1-&gt;flags == KPROBE_FLAG_OPTIMATED;

2. Register the kprobe for &lt;func+9&gt;, assume that is kp2, corresponding optimized_kprobe is op2.

register_kprobe(kp2)
  register_aggr_kprobe
    alloc_aggr_kprobe
      __prepare_optimized_kprobe
        arch_prepare_optimized_kprobe
          __recover_optprobed_insn    // copy original bytes from kp1-&gt;optinsn.copied_insn,
                                      // jump address = &lt;func+14&gt;

3. disable kp1:

disable_kprobe(kp1)
  __disable_kprobe
    ...
    if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
      ret = disarm_kprobe(orig_p, true)       // add op1 in unoptimizing_list, not unoptimized
      orig_p-&gt;flags |= KPROBE_FLAG_DISABLED;  // op1-&gt;flags ==  KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED
    ...

4. unregister kp2
__unregister_kprobe_top
  ...
  if (!kprobe_disabled(ap) &amp;&amp; !kprobes_all_disarmed) {
    optimize_kprobe(op)
      ...
      if (arch_check_optimized_kprobe(op) &lt; 0) // because op1 has KPROBE_FLAG_DISABLED, here not return
        return;
      p-&gt;kp.flags |= KPROBE_FLAG_OPTIMIZED;   //  now op2 has KPROBE_FLAG_OPTIMIZED
  }

"func" code now is:

   0xffffffff816cc079 &lt;+9&gt;:     int3
   0xffffffff816cc07a &lt;+10&gt;:    push   %rsp
   0xffffffff816cc07b &lt;+11&gt;:    jmp    0xffffffffa0210000
   0xffffffff816cc080 &lt;+16&gt;:    incl   0xf(%rcx)
   0xffffffff816cc083 &lt;+19&gt;:    xchg   %eax,%ebp
   0xffffffff816cc084 &lt;+20&gt;:    (bad)
   0xffffffff816cc085 &lt;+21&gt;:    push   %rbp

5. if call "func", int3 handler call setup_detour_execution:

  if (p-&gt;flags &amp; KPROBE_FLAG_OPTIMIZED) {
    ...
    regs-&gt;ip = (unsigned long)op-&gt;optinsn.insn + TMPL_END_IDX;
    ...
  }

The code for the destination address is

   0xffffffffa021072c:  push   %r12
   0xffffffffa021072e:  xor    %r12d,%r12d
   0xffffffffa0210731:  jmp    0xffffffff816cb5ee &lt;func+14&gt;

However, &lt;func+14&gt; is not a valid start instruction address. As a result, an error occurs.

Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/

Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/kprobes: Fix __recover_optprobed_insn check optimizing logic</title>
<updated>2023-02-20T23:49:16+00:00</updated>
<author>
<name>Yang Jihong</name>
<email>yangjihong1@huawei.com</email>
</author>
<published>2023-02-20T23:49:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=868a6fc0ca2407622d2833adefe1c4d284766c4c'/>
<id>868a6fc0ca2407622d2833adefe1c4d284766c4c</id>
<content type='text'>
Since the following commit:

  commit f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")

modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe
may be in the optimizing or unoptimizing state when op.kp-&gt;flags
has KPROBE_FLAG_OPTIMIZED and op-&gt;list is not empty.

The __recover_optprobed_insn check logic is incorrect, a kprobe in the
unoptimizing state may be incorrectly determined as unoptimizing.
As a result, incorrect instructions are copied.

The optprobe_queued_unopt function needs to be exported for invoking in
arch directory.

Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/

Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
Cc: stable@vger.kernel.org
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since the following commit:

  commit f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")

modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe
may be in the optimizing or unoptimizing state when op.kp-&gt;flags
has KPROBE_FLAG_OPTIMIZED and op-&gt;list is not empty.

The __recover_optprobed_insn check logic is incorrect, a kprobe in the
unoptimizing state may be incorrectly determined as unoptimizing.
As a result, incorrect instructions are copied.

The optprobe_queued_unopt function needs to be exported for invoking in
arch directory.

Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/

Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
Cc: stable@vger.kernel.org
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list</title>
<updated>2023-02-20T23:49:16+00:00</updated>
<author>
<name>Masami Hiramatsu (Google)</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2023-02-20T23:49:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4fbd2f83fda0ca44a2ec6421ca3508b355b31858'/>
<id>4fbd2f83fda0ca44a2ec6421ca3508b355b31858</id>
<content type='text'>
Since forcibly unoptimized kprobes will be put on the freeing_list directly
in the unoptimize_kprobe(), do_unoptimize_kprobes() must continue to check
the freeing_list even if unoptimizing_list is empty.

This bug can happen if a kprobe is put in an instruction which is in the
middle of the jump-replaced instruction sequence of an optprobe, *and* the
optprobe is recently unregistered and queued on unoptimizing_list.
In this case, the optprobe will be unoptimized forcibly (means immediately)
and put it into the freeing_list, expecting the optprobe will be handled in
do_unoptimize_kprobe().
But if there is no other optprobes on the unoptimizing_list, current code
returns from the do_unoptimize_kprobe() soon and does not handle the
optprobe which is on the freeing_list. Then the optprobe will hit the
WARN_ON_ONCE() in the do_free_cleaned_kprobes(), because it is not handled
in the latter loop of the do_unoptimize_kprobe().

To solve this issue, do not return from do_unoptimize_kprobes() immediately
even if unoptimizing_list is empty.

Moreover, this change affects another case. kill_optimized_kprobes() expects
kprobe_optimizer() will just free the optprobe on freeing_list.
So I changed it to just do list_move() to freeing_list if optprobes are on
unoptimizing list. And the do_unoptimize_kprobe() will skip
arch_disarm_kprobe() if the probe on freeing_list has gone flag.

Link: https://lore.kernel.org/all/Y8URdIfVr3pq2X8w@xpf.sh.intel.com/
Link: https://lore.kernel.org/all/167448024501.3253718.13037333683110512967.stgit@devnote3/

Fixes: e4add247789e ("kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic")
Reported-by: Pengfei Xu &lt;pengfei.xu@intel.com&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Cc: stable@vger.kernel.org
Acked-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since forcibly unoptimized kprobes will be put on the freeing_list directly
in the unoptimize_kprobe(), do_unoptimize_kprobes() must continue to check
the freeing_list even if unoptimizing_list is empty.

This bug can happen if a kprobe is put in an instruction which is in the
middle of the jump-replaced instruction sequence of an optprobe, *and* the
optprobe is recently unregistered and queued on unoptimizing_list.
In this case, the optprobe will be unoptimized forcibly (means immediately)
and put it into the freeing_list, expecting the optprobe will be handled in
do_unoptimize_kprobe().
But if there is no other optprobes on the unoptimizing_list, current code
returns from the do_unoptimize_kprobe() soon and does not handle the
optprobe which is on the freeing_list. Then the optprobe will hit the
WARN_ON_ONCE() in the do_free_cleaned_kprobes(), because it is not handled
in the latter loop of the do_unoptimize_kprobe().

To solve this issue, do not return from do_unoptimize_kprobes() immediately
even if unoptimizing_list is empty.

Moreover, this change affects another case. kill_optimized_kprobes() expects
kprobe_optimizer() will just free the optprobe on freeing_list.
So I changed it to just do list_move() to freeing_list if optprobes are on
unoptimizing list. And the do_unoptimize_kprobe() will skip
arch_disarm_kprobe() if the probe on freeing_list has gone flag.

Link: https://lore.kernel.org/all/Y8URdIfVr3pq2X8w@xpf.sh.intel.com/
Link: https://lore.kernel.org/all/167448024501.3253718.13037333683110512967.stgit@devnote3/

Fixes: e4add247789e ("kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic")
Reported-by: Pengfei Xu &lt;pengfei.xu@intel.com&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Cc: stable@vger.kernel.org
Acked-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
