<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/asm-generic/bitops, branch v5.15-rc5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge tag 'asm-generic-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic</title>
<updated>2021-09-01T22:13:02+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-09-01T22:13:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4cdc4cc2ad35f92338497d53d3e8b7876cf2a51d'/>
<id>4cdc4cc2ad35f92338497d53d3e8b7876cf2a51d</id>
<content type='text'>
Pull asm-generic updates from Arnd Bergmann:
 "The main content for 5.15 is a series that cleans up the handling of
  strncpy_from_user() and strnlen_user(), removing a lot of slightly
  incorrect versions of these in favor of the lib/strn*.c helpers that
  implement these correctly and more efficiently.

  The only architectures that retain a private version now are mips,
  ia64, um and parisc. I had offered to convert those at all, but Thomas
  Bogendoerfer wanted to keep the mips version for the moment until he
  had a chance to do regression testing.

  The branch also contains two patches for bitops and for ffs()"

* tag 'asm-generic-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  bitops/non-atomic: make @nr unsigned to avoid any DIV
  asm-generic: ffs: Drop bogus reference to ffz location
  asm-generic: reverse GENERIC_{STRNCPY_FROM,STRNLEN}_USER symbols
  asm-generic: remove extra strn{cpy_from,len}_user declarations
  asm-generic: uaccess: remove inline strncpy_from_user/strnlen_user
  s390: use generic strncpy/strnlen from_user
  microblaze: use generic strncpy/strnlen from_user
  csky: use generic strncpy/strnlen from_user
  arc: use generic strncpy/strnlen from_user
  hexagon: use generic strncpy/strnlen from_user
  h8300: remove stale strncpy_from_user
  asm-generic/uaccess.h: remove __strncpy_from_user/__strnlen_user
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull asm-generic updates from Arnd Bergmann:
 "The main content for 5.15 is a series that cleans up the handling of
  strncpy_from_user() and strnlen_user(), removing a lot of slightly
  incorrect versions of these in favor of the lib/strn*.c helpers that
  implement these correctly and more efficiently.

  The only architectures that retain a private version now are mips,
  ia64, um and parisc. I had offered to convert those at all, but Thomas
  Bogendoerfer wanted to keep the mips version for the moment until he
  had a chance to do regression testing.

  The branch also contains two patches for bitops and for ffs()"

* tag 'asm-generic-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  bitops/non-atomic: make @nr unsigned to avoid any DIV
  asm-generic: ffs: Drop bogus reference to ffz location
  asm-generic: reverse GENERIC_{STRNCPY_FROM,STRNLEN}_USER symbols
  asm-generic: remove extra strn{cpy_from,len}_user declarations
  asm-generic: uaccess: remove inline strncpy_from_user/strnlen_user
  s390: use generic strncpy/strnlen from_user
  microblaze: use generic strncpy/strnlen from_user
  csky: use generic strncpy/strnlen from_user
  arc: use generic strncpy/strnlen from_user
  hexagon: use generic strncpy/strnlen from_user
  h8300: remove stale strncpy_from_user
  asm-generic/uaccess.h: remove __strncpy_from_user/__strnlen_user
</pre>
</div>
</content>
</entry>
<entry>
<title>bitops/non-atomic: make @nr unsigned to avoid any DIV</title>
<updated>2021-08-14T11:07:42+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>Vineet.Gupta1@synopsys.com</email>
</author>
<published>2021-08-05T19:14:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8f76f9c46952659dd925c21c3f62a0d05a3f3e71'/>
<id>8f76f9c46952659dd925c21c3f62a0d05a3f3e71</id>
<content type='text'>
signed math causes generation of costlier instructions such as DIV when
they could be done by barrerl shifter.

Worse part is this is not caught by things like bloat-o-meter since
instruction length / symbols are typically same size.

e.g.

stock (signed math)
__________________

919b4614 &lt;test_taint&gt;:
919b4614:	div	r2,r0,0x20
                ^^^
919b4618:	add2	r2,0x920f6050,r2
919b4620:	ld_s	r2,[r2,0]
919b4622:	lsr	r0,r2,r0
919b4626:	j_s.d	[blink]
919b4628:	bmsk_s	r0,r0,0
919b462a:	nop_s

(patched) unsigned math
__________________

919b4614 &lt;test_taint&gt;:
919b4614:	lsr	r2,r0,0x5  @nr/32
                ^^^
919b4618:	add2	r2,0x920f6050,r2
919b4620:	ld_s	r2,[r2,0]
919b4622:	lsr	r0,r2,r0     #test_bit()
919b4626:	j_s.d	[blink]
919b4628:	bmsk_s	r0,r0,0
919b462a:	nop_s

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
signed math causes generation of costlier instructions such as DIV when
they could be done by barrerl shifter.

Worse part is this is not caught by things like bloat-o-meter since
instruction length / symbols are typically same size.

e.g.

stock (signed math)
__________________

919b4614 &lt;test_taint&gt;:
919b4614:	div	r2,r0,0x20
                ^^^
919b4618:	add2	r2,0x920f6050,r2
919b4620:	ld_s	r2,[r2,0]
919b4622:	lsr	r0,r2,r0
919b4626:	j_s.d	[blink]
919b4628:	bmsk_s	r0,r0,0
919b462a:	nop_s

(patched) unsigned math
__________________

919b4614 &lt;test_taint&gt;:
919b4614:	lsr	r2,r0,0x5  @nr/32
                ^^^
919b4618:	add2	r2,0x920f6050,r2
919b4620:	ld_s	r2,[r2,0]
919b4622:	lsr	r0,r2,r0     #test_bit()
919b4626:	j_s.d	[blink]
919b4628:	bmsk_s	r0,r0,0
919b462a:	nop_s

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>asm-generic: ffs: Drop bogus reference to ffz location</title>
<updated>2021-08-11T09:38:10+00:00</updated>
<author>
<name>Geert Uytterhoeven</name>
<email>geert+renesas@glider.be</email>
</author>
<published>2021-07-14T09:44:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fc062ad8e406a08b624b3ab3427434800ff886f8'/>
<id>fc062ad8e406a08b624b3ab3427434800ff886f8</id>
<content type='text'>
The generic definition of ffz() is not defined in the same header files
as the generic definitions of ffs().

Signed-off-by: Geert Uytterhoeven &lt;geert+renesas@glider.be&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The generic definition of ffz() is not defined in the same header files
as the generic definitions of ffs().

Signed-off-by: Geert Uytterhoeven &lt;geert+renesas@glider.be&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/atomic: simplify non-atomic wrappers</title>
<updated>2021-08-04T13:16:47+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2021-07-21T15:58:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9248e52fec9536590852844b0634b5d20483c1ab'/>
<id>9248e52fec9536590852844b0634b5d20483c1ab</id>
<content type='text'>
Since the non-atomic arch_*() bitops use plain accesses, they are
implicitly instrumnted by the compiler, and we work around this in the
instrumented wrappers to avoid double instrumentation.

It's simpler to avoid the wrappers entirely, and use the preprocessor to
alias the arch_*() bitops to their regular versions, removing the need
for checks in the instrumented wrappers.

Suggested-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Link: https://lore.kernel.org/r/20210721155813.17082-1-mark.rutland@arm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since the non-atomic arch_*() bitops use plain accesses, they are
implicitly instrumnted by the compiler, and we work around this in the
instrumented wrappers to avoid double instrumentation.

It's simpler to avoid the wrappers entirely, and use the preprocessor to
alias the arch_*() bitops to their regular versions, removing the need
for checks in the instrumented wrappers.

Suggested-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Link: https://lore.kernel.org/r/20210721155813.17082-1-mark.rutland@arm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/atomic: add generic arch_*() bitops</title>
<updated>2021-07-16T16:46:45+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2021-07-13T10:52:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cf3ee3c8c29dc349b2cf52e5e72e8cb805ff5e57'/>
<id>cf3ee3c8c29dc349b2cf52e5e72e8cb805ff5e57</id>
<content type='text'>
Now that all architectures provide arch_atomic_long_*(), we can
implement the generic bitops atop these rather than atop
atomic_long_*(), and provide arch_*() forms of the bitops that are safe
to use in noinstr code.

Now that all architectures provide arch_atomic_long_*(), we can
build the generic arch_*() bitops atop these, which can be safely used
in noinstr code. The regular bitop wrappers are built atop these.

As the generic non-atomic bitops use plain accesses, these will be
implicitly instrumented unless they are inlined into noinstr functions
(which is similar to arch_atomic*_read() when based on READ_ONCE()).
The wrappers are modified so that where the underlying arch_*() function
uses a plain access, no explicit instrumentation is added, as this is
redundant and could result in confusing reports.

Since function prototypes get excessively long with both an `arch_`
prefix and `__always_inline` attribute, the return type and function
attributes have been split onto a separate line, matching the style of
the generated atomic headers.

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20210713105253.7615-6-mark.rutland@arm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that all architectures provide arch_atomic_long_*(), we can
implement the generic bitops atop these rather than atop
atomic_long_*(), and provide arch_*() forms of the bitops that are safe
to use in noinstr code.

Now that all architectures provide arch_atomic_long_*(), we can
build the generic arch_*() bitops atop these, which can be safely used
in noinstr code. The regular bitop wrappers are built atop these.

As the generic non-atomic bitops use plain accesses, these will be
implicitly instrumented unless they are inlined into noinstr functions
(which is similar to arch_atomic*_read() when based on READ_ONCE()).
The wrappers are modified so that where the underlying arch_*() function
uses a plain access, no explicit instrumentation is added, as this is
redundant and could result in confusing reports.

Since function prototypes get excessively long with both an `arch_`
prefix and `__always_inline` attribute, the return type and function
attributes have been split onto a separate line, matching the style of
the generated atomic headers.

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20210713105253.7615-6-mark.rutland@arm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>lib: add fast path for find_first_*_bit() and find_last_bit()</title>
<updated>2021-05-07T02:24:12+00:00</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2021-05-07T01:03:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2cc7b6a44ac21d31b398b03f4845c53152070416'/>
<id>2cc7b6a44ac21d31b398b03f4845c53152070416</id>
<content type='text'>
Similarly to bitmap functions, users would benefit if we'll handle a case
of small-size bitmaps that fit into a single word.

While here, move the find_last_bit() declaration to bitops/find.h where
other find_*_bit() functions sit.

Link: https://lkml.kernel.org/r/20210401003153.97325-11-yury.norov@gmail.com
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Acked-by: Andy Shevchenko &lt;andy.shevchenko@gmail.com&gt;
Cc: Alexey Klimov &lt;aklimov@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Jianpeng Ma &lt;jianpeng.ma@intel.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Cc: Wei Yang &lt;richard.weiyang@linux.alibaba.com&gt;
Cc: Wolfram Sang &lt;wsa+renesas@sang-engineering.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.osdn.me&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Similarly to bitmap functions, users would benefit if we'll handle a case
of small-size bitmaps that fit into a single word.

While here, move the find_last_bit() declaration to bitops/find.h where
other find_*_bit() functions sit.

Link: https://lkml.kernel.org/r/20210401003153.97325-11-yury.norov@gmail.com
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Acked-by: Andy Shevchenko &lt;andy.shevchenko@gmail.com&gt;
Cc: Alexey Klimov &lt;aklimov@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Jianpeng Ma &lt;jianpeng.ma@intel.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Cc: Wei Yang &lt;richard.weiyang@linux.alibaba.com&gt;
Cc: Wolfram Sang &lt;wsa+renesas@sang-engineering.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.osdn.me&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib: add fast path for find_next_*_bit()</title>
<updated>2021-05-07T02:24:12+00:00</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2021-05-07T01:03:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=277a20a498d30753f5d8a607dbf967bc163552c1'/>
<id>277a20a498d30753f5d8a607dbf967bc163552c1</id>
<content type='text'>
Similarly to bitmap functions, find_next_*_bit() users will benefit if
we'll handle a case of bitmaps that fit into a single word inline.  In the
very best case, the compiler may replace a function call with a few
instructions.

This is the quite typical find_next_bit() user:

	unsigned int cpumask_next(int n, const struct cpumask *srcp)
	{
		/* -1 is a legal arg here. */
		if (n != -1)
			cpumask_check(n);
		return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
	}
	EXPORT_SYMBOL(cpumask_next);

Currently, on ARM64 the generated code looks like this:
	0000000000000000 &lt;cpumask_next&gt;:
	   0:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
	   4:   11000402        add     w2, w0, #0x1
	   8:   aa0103e0        mov     x0, x1
	   c:   d2800401        mov     x1, #0x40                       // #64
	  10:   910003fd        mov     x29, sp
	  14:   93407c42        sxtw    x2, w2
	  18:   94000000        bl      0 &lt;find_next_bit&gt;
	  1c:   a8c17bfd        ldp     x29, x30, [sp], #16
	  20:   d65f03c0        ret
	  24:   d503201f        nop

After applying this patch:
	0000000000000140 &lt;cpumask_next&gt;:
	 140:   11000400        add     w0, w0, #0x1
	 144:   93407c00        sxtw    x0, w0
	 148:   f100fc1f        cmp     x0, #0x3f
	 14c:   54000168        b.hi    178 &lt;cpumask_next+0x38&gt;  // b.pmore
	 150:   f9400023        ldr     x3, [x1]
	 154:   92800001        mov     x1, #0xffffffffffffffff         // #-1
	 158:   9ac02020        lsl     x0, x1, x0
	 15c:   52800802        mov     w2, #0x40                       // #64
	 160:   8a030001        and     x1, x0, x3
	 164:   dac00020        rbit    x0, x1
	 168:   f100003f        cmp     x1, #0x0
	 16c:   dac01000        clz     x0, x0
	 170:   1a800040        csel    w0, w2, w0, eq  // eq = none
	 174:   d65f03c0        ret
	 178:   52800800        mov     w0, #0x40                       // #64
	 17c:   d65f03c0        ret

find_next_bit() call is replaced with 6 instructions.  find_next_bit()
itself is 41 instructions plus function call overhead.

Despite inlining, the scripts/bloat-o-meter report smaller .text size
after applying the series:
	add/remove: 11/9 grow/shrink: 233/176 up/down: 5780/-6768 (-988)

Link: https://lkml.kernel.org/r/20210401003153.97325-10-yury.norov@gmail.com
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Acked-by: Andy Shevchenko &lt;andy.shevchenko@gmail.com&gt;
Cc: Alexey Klimov &lt;aklimov@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Jianpeng Ma &lt;jianpeng.ma@intel.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Cc: Wei Yang &lt;richard.weiyang@linux.alibaba.com&gt;
Cc: Wolfram Sang &lt;wsa+renesas@sang-engineering.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.osdn.me&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Similarly to bitmap functions, find_next_*_bit() users will benefit if
we'll handle a case of bitmaps that fit into a single word inline.  In the
very best case, the compiler may replace a function call with a few
instructions.

This is the quite typical find_next_bit() user:

	unsigned int cpumask_next(int n, const struct cpumask *srcp)
	{
		/* -1 is a legal arg here. */
		if (n != -1)
			cpumask_check(n);
		return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
	}
	EXPORT_SYMBOL(cpumask_next);

Currently, on ARM64 the generated code looks like this:
	0000000000000000 &lt;cpumask_next&gt;:
	   0:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
	   4:   11000402        add     w2, w0, #0x1
	   8:   aa0103e0        mov     x0, x1
	   c:   d2800401        mov     x1, #0x40                       // #64
	  10:   910003fd        mov     x29, sp
	  14:   93407c42        sxtw    x2, w2
	  18:   94000000        bl      0 &lt;find_next_bit&gt;
	  1c:   a8c17bfd        ldp     x29, x30, [sp], #16
	  20:   d65f03c0        ret
	  24:   d503201f        nop

After applying this patch:
	0000000000000140 &lt;cpumask_next&gt;:
	 140:   11000400        add     w0, w0, #0x1
	 144:   93407c00        sxtw    x0, w0
	 148:   f100fc1f        cmp     x0, #0x3f
	 14c:   54000168        b.hi    178 &lt;cpumask_next+0x38&gt;  // b.pmore
	 150:   f9400023        ldr     x3, [x1]
	 154:   92800001        mov     x1, #0xffffffffffffffff         // #-1
	 158:   9ac02020        lsl     x0, x1, x0
	 15c:   52800802        mov     w2, #0x40                       // #64
	 160:   8a030001        and     x1, x0, x3
	 164:   dac00020        rbit    x0, x1
	 168:   f100003f        cmp     x1, #0x0
	 16c:   dac01000        clz     x0, x0
	 170:   1a800040        csel    w0, w2, w0, eq  // eq = none
	 174:   d65f03c0        ret
	 178:   52800800        mov     w0, #0x40                       // #64
	 17c:   d65f03c0        ret

find_next_bit() call is replaced with 6 instructions.  find_next_bit()
itself is 41 instructions plus function call overhead.

Despite inlining, the scripts/bloat-o-meter report smaller .text size
after applying the series:
	add/remove: 11/9 grow/shrink: 233/176 up/down: 5780/-6768 (-988)

Link: https://lkml.kernel.org/r/20210401003153.97325-10-yury.norov@gmail.com
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Acked-by: Andy Shevchenko &lt;andy.shevchenko@gmail.com&gt;
Cc: Alexey Klimov &lt;aklimov@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Jianpeng Ma &lt;jianpeng.ma@intel.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Cc: Wei Yang &lt;richard.weiyang@linux.alibaba.com&gt;
Cc: Wolfram Sang &lt;wsa+renesas@sang-engineering.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.osdn.me&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib: inline _find_next_bit() wrappers</title>
<updated>2021-05-07T02:24:12+00:00</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2021-05-07T01:03:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5c88af59f9abc202648a431428ad9d32e5d2a201'/>
<id>5c88af59f9abc202648a431428ad9d32e5d2a201</id>
<content type='text'>
lib/find_bit.c declares five single-line wrappers for _find_next_bit().
We may turn those wrappers to inline functions.  It eliminates unneeded
function calls and opens room for compile-time optimizations.

Link: https://lkml.kernel.org/r/20210401003153.97325-8-yury.norov@gmail.com
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Acked-by: Andy Shevchenko &lt;andy.shevchenko@gmail.com&gt;
Cc: Alexey Klimov &lt;aklimov@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Jianpeng Ma &lt;jianpeng.ma@intel.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Cc: Wei Yang &lt;richard.weiyang@linux.alibaba.com&gt;
Cc: Wolfram Sang &lt;wsa+renesas@sang-engineering.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.osdn.me&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
lib/find_bit.c declares five single-line wrappers for _find_next_bit().
We may turn those wrappers to inline functions.  It eliminates unneeded
function calls and opens room for compile-time optimizations.

Link: https://lkml.kernel.org/r/20210401003153.97325-8-yury.norov@gmail.com
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Acked-by: Andy Shevchenko &lt;andy.shevchenko@gmail.com&gt;
Cc: Alexey Klimov &lt;aklimov@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Jianpeng Ma &lt;jianpeng.ma@intel.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Cc: Wei Yang &lt;richard.weiyang@linux.alibaba.com&gt;
Cc: Wolfram Sang &lt;wsa+renesas@sang-engineering.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.osdn.me&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: make atomic helpers __always_inline</title>
<updated>2021-01-13T15:09:06+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2021-01-08T09:19:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c35a824c31834d947fb99b0c608c1b9f922b4ba0'/>
<id>c35a824c31834d947fb99b0c608c1b9f922b4ba0</id>
<content type='text'>
With UBSAN enabled and building with clang, there are occasionally
warnings like

WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference from the function arch_atomic64_or() to the variable .init.data:numa_nodes_parsed
The function arch_atomic64_or() references
the variable __initdata numa_nodes_parsed.
This is often because arch_atomic64_or lacks a __initdata
annotation or the annotation of numa_nodes_parsed is wrong.

for functions that end up not being inlined as intended but operating
on __initdata variables. Mark these as __always_inline, along with
the corresponding asm-generic wrappers.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Link: https://lore.kernel.org/r/20210108092024.4034860-1-arnd@kernel.org
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With UBSAN enabled and building with clang, there are occasionally
warnings like

WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference from the function arch_atomic64_or() to the variable .init.data:numa_nodes_parsed
The function arch_atomic64_or() references
the variable __initdata numa_nodes_parsed.
This is often because arch_atomic64_or lacks a __initdata
annotation or the annotation of numa_nodes_parsed is wrong.

for functions that end up not being inlined as intended but operating
on __initdata variables. Mark these as __always_inline, along with
the corresponding asm-generic wrappers.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Link: https://lore.kernel.org/r/20210108092024.4034860-1-arnd@kernel.org
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>asm-generic: fix ffs -Wshadow warning</title>
<updated>2020-10-26T16:00:29+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2020-10-21T15:19:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6f6573a4044adefbd07f1bd951a2041150e888d7'/>
<id>6f6573a4044adefbd07f1bd951a2041150e888d7</id>
<content type='text'>
gcc -Wshadow warns about the ffs() definition that has the
same name as the global ffs() built-in:

include/asm-generic/bitops/builtin-ffs.h:13:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]

This is annoying because 'make W=2' warns every time this
header gets included.

Change it to use a #define instead, making callers directly
reference the builtin.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
gcc -Wshadow warns about the ffs() definition that has the
same name as the global ffs() built-in:

include/asm-generic/bitops/builtin-ffs.h:13:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]

This is annoying because 'make W=2' warns every time this
header gets included.

Change it to use a #define instead, making callers directly
reference the builtin.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
