linux-toradex.git/kernel/module.c, branch v5.16-rc6

module: change to print useful messages from elf_validity_check()

2021-11-05T22:13:10+00:00

elf_validity_check() checks ELF headers for errors and ELF Spec.
compliance and if any of them fail it returns -ENOEXEC from all of
these error paths. Almost all of them don't print any messages.

When elf_validity_check() returns an error, load_module() prints an
error message without error code. It is hard to determine why the
module ELF structure is invalid, even if load_module() prints the
error code which is -ENOEXEC in all of these cases.

Change to print useful error messages from elf_validity_check() to
clearly say what went wrong and why the ELF validity checks failed.

Remove the load_module() error message which is no longer needed.
This patch includes changes to fix build warns on 32-bit platforms:

warning: format '%llu' expects argument of type 'long long unsigned int',
but argument 3 has type 'Elf32_Off' {aka 'unsigned int'}
Reported-by: kernel test robot 

Signed-off-by: Shuah Khan 
Signed-off-by: Luis Chamberlain

module: fix validate_section_offset() overflow bug on 64-bit

2021-11-05T22:13:10+00:00

validate_section_offset() uses unsigned long local variable to
add/store shdr->sh_offset and shdr->sh_size on all platforms.
unsigned long is too short when sh_offset is Elf64_Off which
would be the case on 64bit ELF headers.

Without this fix applied we were shorting the design of modules
to have section headers placed within the 32-bit boundary (4 GiB)
instead of 64-bits when on 64-bit architectures (which allows for
up to 16,777,216 TiB). In practice this just meant we were limiting
modules sections to below 4 GiB even on 64-bit systems. This then
should not really affect any real-world use case as modules these
days obviously should likely never exceed 1 GiB in size overall.
A specially crafted invalid module might succeed to skip validation
in validate_section_offset() due to this mistake, but in such case
no impact is observed through code inspection given the correct data
types are used for the copy of the module when needed on move_module()
when the section type is not SHT_NOBITS (which indicates no the
section occupies no space on the file).

Fix the overflow problem using the right size local variable when
CONFIG_64BIT is defined.

Signed-off-by: Shuah Khan 
[mcgrof: expand commit log with possible impact if not applied]
Signed-off-by: Luis Chamberlain

module: fix clang CFI with MODULE_UNLOAD=n

2021-09-28T10:56:26+00:00

When CONFIG_MODULE_UNLOAD is disabled, the module->exit member
is not defined, causing a build failure:

kernel/module.c:4493:8: error: no member named 'exit' in 'struct module'
                mod->exit = *exit;

add an #ifdef block around this.

Fixes: cf68fffb66d6 ("add support for Clang CFI")
Acked-by: Kees Cook 
Reviewed-by: Sami Tolvanen 
Reviewed-by: Miroslav Benes 
Signed-off-by: Arnd Bergmann 
Signed-off-by: Jessica Yu

printk: Userspace format indexing support

2021-07-19T09:57:48+00:00

We have a number of systems industry-wide that have a subset of their
functionality that works as follows:

1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
   remediation for the machine), rinse, and repeat.

As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.

While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.

Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.

As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.

One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.

This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.

Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.

This provides a solution to the issue of silently changed or deleted
printks: we record pointers to all printk format strings known at
compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
/printk/index/, which emits the following format, both
readable by humans and able to be parsed by machines:

    $ head -1 vmlinux; shuf -n 5 vmlinux
    #  filename:line function "format"
    <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
    <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
    <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
    <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
    <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"

This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.

There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.

Signed-off-by: Chris Down 
Cc: Petr Mladek 
Cc: Jessica Yu 
Cc: Sergey Senozhatsky 
Cc: John Ogness 
Cc: Steven Rostedt 
Cc: Greg Kroah-Hartman 
Cc: Johannes Weiner 
Cc: Kees Cook 
Reviewed-by: Petr Mladek 
Tested-by: Petr Mladek 
Reported-by: kernel test robot 
Acked-by: Andy Shevchenko 
Acked-by: Jessica Yu  # for module.{c,h}
Signed-off-by: Petr Mladek 
Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name

module: add printk formats to add module build ID to stacktraces

2021-07-08T18:48:22+00:00

Let's make kernel stacktraces easier to identify by including the build
ID[1] of a module if the stacktrace is printing a symbol from a module.
This makes it simpler for developers to locate a kernel module's full
debuginfo for a particular stacktrace.  Combined with
scripts/decode_stracktrace.sh, a developer can download the matching
debuginfo from a debuginfod[2] server and find the exact file and line
number for the functions plus offsets in a stacktrace that match the
module.  This is especially useful for pstore crash debugging where the
kernel crashes are recorded in something like console-ramoops and the
recovery kernel/modules are different or the debuginfo doesn't exist on
the device due to space concerns (the debuginfo can be too large for space
limited devices).

Originally, I put this on the %pS format, but that was quickly rejected
given that %pS is used in other places such as ftrace where build IDs
aren't meaningful.  There was some discussions on the list to put every
module build ID into the "Modules linked in:" section of the stacktrace
message but that quickly becomes very hard to read once you have more than
three or four modules linked in.  It also provides too much information
when we don't expect each module to be traversed in a stacktrace.  Having
the build ID for modules that aren't important just makes things messy.
Splitting it to multiple lines for each module quickly explodes the number
of lines printed in an oops too, possibly wrapping the warning off the
console.  And finally, trying to stash away each module used in a
callstack to provide the ID of each symbol printed is cumbersome and would
require changes to each architecture to stash away modules and return
their build IDs once unwinding has completed.

Instead, we opt for the simpler approach of introducing new printk formats
'%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb'
for "pointer backtrace with module build ID" and then updating the few
places in the architecture layer where the stacktrace is printed to use
this new format.

Before:

 Call trace:
  lkdtm_WARNING+0x28/0x30 [lkdtm]
  direct_entry+0x16c/0x1b4 [lkdtm]
  full_proxy_write+0x74/0xa4
  vfs_write+0xec/0x2e8

After:

 Call trace:
  lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
  direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
  full_proxy_write+0x74/0xa4
  vfs_write+0xec/0x2e8

[akpm@linux-foundation.org: fix build with CONFIG_MODULES=n, tweak code layout]
[rdunlap@infradead.org: fix build when CONFIG_MODULES is not set]
  Link: https://lkml.kernel.org/r/20210513171510.20328-1-rdunlap@infradead.org
[akpm@linux-foundation.org: make kallsyms_lookup_buildid() static]
[cuibixuan@huawei.com: fix build error when CONFIG_SYSFS is disabled]
  Link: https://lkml.kernel.org/r/20210525105049.34804-1-cuibixuan@huawei.com

Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org
Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
Link: https://sourceware.org/elfutils/Debuginfod.html [2]
Signed-off-by: Stephen Boyd 
Signed-off-by: Bixuan Cui 
Signed-off-by: Randy Dunlap 
Cc: Jiri Olsa 
Cc: Alexei Starovoitov 
Cc: Jessica Yu 
Cc: Evan Green 
Cc: Hsin-Yi Wang 
Cc: Petr Mladek 
Cc: Steven Rostedt 
Cc: Sergey Senozhatsky 
Cc: Andy Shevchenko 
Cc: Rasmus Villemoes 
Cc: Matthew Wilcox 
Cc: Baoquan He 
Cc: Borislav Petkov 
Cc: Catalin Marinas 
Cc: Dave Young 
Cc: Ingo Molnar 
Cc: Konstantin Khlebnikov 
Cc: Sasha Levin 
Cc: Thomas Gleixner 
Cc: Vivek Goyal 
Cc: Will Deacon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge tag 'modules-for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

2021-07-07T18:41:32+00:00

Pull module updates from Jessica Yu:

 - Fix incorrect logic in module_kallsyms_on_each_symbol()

 - Fix for a Coccinelle warning

* tag 'modules-for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: correctly exit module_kallsyms_on_each_symbol when fn() != 0
  kernel/module: Use BUG_ON instead of if condition followed by BUG

module: limit enabling module.sig_enforce

2021-06-22T18:13:19+00:00

Irrespective as to whether CONFIG_MODULE_SIG is configured, specifying
"module.sig_enforce=1" on the boot command line sets "sig_enforce".
Only allow "sig_enforce" to be set when CONFIG_MODULE_SIG is configured.

This patch makes the presence of /sys/module/module/parameters/sig_enforce
dependent on CONFIG_MODULE_SIG=y.

Fixes: fda784e50aac ("module: export module signature enforcement status")
Reported-by: Nayna Jain 
Tested-by: Mimi Zohar 
Tested-by: Jessica Yu 
Signed-off-by: Mimi Zohar 
Signed-off-by: Jessica Yu 
Signed-off-by: Linus Torvalds

module: correctly exit module_kallsyms_on_each_symbol when fn() != 0

2021-05-26T12:55:45+00:00

Commit 013c1667cf78 ("kallsyms: refactor
{,module_}kallsyms_on_each_symbol") replaced the return inside the
nested loop with a break, changing the semantics of the function: the
break only exits the innermost loop, so the code continues iterating the
symbols of the next module instead of exiting.

Fixes: 013c1667cf78 ("kallsyms: refactor {,module_}kallsyms_on_each_symbol")
Reviewed-by: Petr Mladek 
Reviewed-by: Miroslav Benes 
Signed-off-by: Jon Mediero 
Signed-off-by: Jessica Yu

module: check for exit sections in layout_sections() instead of module_init_section()

2021-05-17T07:48:24+00:00

Previously, when CONFIG_MODULE_UNLOAD=n, the module loader just does not
attempt to load exit sections since it never expects that any code in those
sections will ever execute. However, dynamic code patching (alternatives,
jump_label and static_call) can have sites in __exit code, even if __exit is
never executed. Therefore __exit must be present at runtime, at least for as
long as __init code is.

Commit 33121347fb1c ("module: treat exit sections the same as init
sections when !CONFIG_MODULE_UNLOAD") solves the requirements of
jump_labels and static_calls by putting the exit sections in the init
region of the module so that they are at least present at init, and
discarded afterwards. It does this by including a check for exit
sections in module_init_section(), so that it also returns true for exit
sections, and the module loader will automatically sort them in the init
region of the module.

However, the solution there was not completely arch-independent. ARM is
a special case where it supplies its own module_{init, exit}_section()
functions. Instead of pushing the exit section checks into
module_init_section(), just implement the exit section check in
layout_sections(), so that we don't have to touch arch-dependent code.

Fixes: 33121347fb1c ("module: treat exit sections the same as init sections when !CONFIG_MODULE_UNLOAD")
Reviewed-by: Russell King (Oracle) 
Signed-off-by: Jessica Yu

kernel/module: Use BUG_ON instead of if condition followed by BUG

2021-05-14T07:50:56+00:00

Fix the following coccinelle report:

kernel/module.c:1018:2-5:
WARNING: Use BUG_ON instead of if condition followed by BUG.

BUG_ON uses unlikely in if(). Through disassembly, we can see that
brk #0x800 is compiled to the end of the function.
As you can see below:
    ......
    ffffff8008660bec:   d65f03c0    ret
    ffffff8008660bf0:   d4210000    brk #0x800

Usually, the condition in if () is not satisfied. For the
multi-stage pipeline, we do not need to perform fetch decode
and excute operation on brk instruction.

In my opinion, this can improve the efficiency of the
multi-stage pipeline.

Signed-off-by: zhouchuangao 
Signed-off-by: Jessica Yu