| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates for 7.0
- Implement masked user access
- Add bpf support for internal only per-CPU instructions and inline the
bpf_get_smp_processor_id() and bpf_get_current_task() functions
- Fix pSeries MSI-X allocation failure when quota is exceeded
- Fix recursive pci_lock_rescan_remove locking in EEH event handling
- Support tailcalls with subprogs & BPF exceptions on 64bit
- Extend "trusted" keys to support the PowerVM Key Wrapping Module
(PKWM)
Thanks to Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li,
Jarkko Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam
Cao, Narayana Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket
Kumar Bhaskar, Sourabh Jain, Srish Srinivasan, and Venkat Rao Bagalkote.
* tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (27 commits)
powerpc/pseries: plpks: export plpks_wrapping_is_supported
docs: trusted-encryped: add PKWM as a new trust source
keys/trusted_keys: establish PKWM as a trusted source
pseries/plpks: add HCALLs for PowerVM Key Wrapping Module
pseries/plpks: expose PowerVM wrapping features via the sysfs
powerpc/pseries: move the PLPKS config inside its own sysfs directory
pseries/plpks: fix kernel-doc comment inconsistencies
powerpc/smp: Add check for kcalloc() failure in parse_thread_groups()
powerpc: kgdb: Remove OUTBUFMAX constant
powerpc64/bpf: Additional NVR handling for bpf_throw
powerpc64/bpf: Support exceptions
powerpc64/bpf: Add arch_bpf_stack_walk() for BPF JIT
powerpc64/bpf: Avoid tailcall restore from trampoline
powerpc64/bpf: Support tailcalls with subprogs
powerpc64/bpf: Moving tail_call_cnt to bottom of frame
powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling
powerpc/pseries: Fix MSI-X allocation failure when quota is exceeded
powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory
powerpc64/bpf: Inline bpf_get_smp_processor_id() and bpf_get_current_task/_btf()
powerpc64/bpf: Support internal-only MOV instruction to resolve per-CPU addrs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Borislav Petkov:
- A nice cleanup to the paravirt code containing a unification of the
paravirt clock interface, taming the include hell by splitting the
pv_ops structure and removing of a bunch of obsolete code (Juergen
Gross)
* tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted()
x86/paravirt: Remove trailing semicolons from alternative asm templates
x86/pvlocks: Move paravirt spinlock functions into own header
x86/paravirt: Specify pv_ops array in paravirt macros
x86/paravirt: Allow pv-calls outside paravirt.h
objtool: Allow multiple pv_ops arrays
x86/xen: Drop xen_mmu_ops
x86/xen: Drop xen_cpu_ops
x86/xen: Drop xen_irq_ops
x86/paravirt: Move pv_native_*() prototypes to paravirt.c
x86/paravirt: Introduce new paravirt-base.h header
x86/paravirt: Move paravirt_sched_clock() related code into tsc.c
x86/paravirt: Use common code for paravirt_steal_clock()
riscv/paravirt: Use common code for paravirt_steal_clock()
loongarch/paravirt: Use common code for paravirt_steal_clock()
arm64/paravirt: Use common code for paravirt_steal_clock()
arm/paravirt: Use common code for paravirt_steal_clock()
sched: Move clock related paravirt code to kernel/sched
paravirt: Remove asm/paravirt_api_clock.h
x86/paravirt: Move thunk macros to paravirt_types.h
...
|
|
The hypervisor generated wrapping key is an AES-GCM-256 symmetric key which
is stored in a non-volatile, secure, and encrypted storage called the Power
LPAR Platform KeyStore. It has policy based protections that prevent it
from being read out or exposed to the user.
Implement H_PKS_GEN_KEY, H_PKS_WRAP_OBJECT, and H_PKS_UNWRAP_OBJECT HCALLs
to enable using the PowerVM Key Wrapping Module (PKWM) as a new trust
source for trusted keys. Disallow H_PKS_READ_OBJECT, H_PKS_SIGNED_UPDATE,
and H_PKS_WRITE_OBJECT for objects with the 'wrapping key' policy set.
Capture the availability status for the H_PKS_WRAP_OBJECT interface.
Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-5-ssrish@linux.ibm.com
|
|
Starting with Power11, PowerVM supports a new feature called "Key Wrapping"
that protects user secrets by wrapping them using a hypervisor generated
wrapping key. The status of this feature can be read by the
H_PKS_GET_CONFIG HCALL.
Expose the Power LPAR Platform KeyStore (PLPKS) wrapping features config
via the sysfs file /sys/firmware/plpks/config/wrapping_features.
Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-4-ssrish@linux.ibm.com
|
|
The /sys/firmware/secvar/config directory represents Power LPAR Platform
KeyStore (PLPKS) configuration properties such as max_object_size, signed_
update_algorithms, supported_policies, total_size, used_space, and version.
These attributes describe the PLPKS, and not the secure boot variables
(secvars).
Create /sys/firmware/plpks directory and move the PLPKS config inside this
directory. For backwards compatibility, create a soft link from the secvar
sysfs directory to this config and emit a warning stating that the older
sysfs path has been deprecated. Separate out the plpks specific
documentation from secvar.
Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-3-ssrish@linux.ibm.com
|
|
Fix issues with comments for all the applicable functions to be
consistent with kernel-doc format. Move them before the function
definition as opposed to the function prototype.
Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-2-ssrish@linux.ibm.com
|
|
This constant was introduced in commit 17ce452f7ea3 ("kgdb, powerpc:
arch specific powerpc kgdb support"), but it is no longer used anywhere
in the source tree.
Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250915141808.146695-1-mikisabate@gmail.com
|
|
For consistency with __vdso_clock_gettime64() there should also be a
64-bit variant of clock_getres(). This will allow the extension of
CONFIG_COMPAT_32BIT_TIME to the vDSO and finally the removal of 32-bit
time types from the kernel and UAPI.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://patch.msgid.link/20260114-vdso-powerpc-align-v1-1-acf09373d568@linutronix.de
|
|
Paravirt clock related functions are available in multiple archs.
In order to share the common parts, move the common static keys
to kernel/sched/ and remove them from the arch specific files.
Make a common paravirt_steal_clock() implementation available in
kernel/sched/cputime.c, guarding it with a new config option
CONFIG_HAVE_PV_STEAL_CLOCK_GEN, which can be selected by an arch
in case it wants to use that common variant.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-7-jgross@suse.com
|
|
All architectures supporting CONFIG_PARAVIRT share the same contents
of asm/paravirt_api_clock.h:
#include <asm/paravirt.h>
So remove all incarnations of asm/paravirt_api_clock.h and remove the
only place where it is included, as there asm/paravirt.h is included
anyway.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> # powerpc, scheduler bits
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-6-jgross@suse.com
|
|
The recent commit 1010b4c012b0 ("powerpc/eeh: Make EEH driver device
hotplug safe") restructured the EEH driver to improve synchronization
with the PCI hotplug layer.
However, it inadvertently moved pci_lock_rescan_remove() outside its
intended scope in eeh_handle_normal_event(), leading to broken PCI
error reporting and improper EEH event triggering. Specifically,
eeh_handle_normal_event() acquired pci_lock_rescan_remove() before
calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to
acquire the same lock internally, causing nested locking and disrupting
normal EEH event handling paths.
This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(),
with two public wrappers:
eeh_pe_bus_get() with locking enabled.
eeh_pe_bus_get_nolock() that skips locking.
Callers that already hold pci_lock_rescan_remove() now use
eeh_pe_bus_get_nolock() to avoid recursive lock acquisition.
Additionally, pci_lock_rescan_remove() calls are restored to the correct
position—after eeh_pe_bus_get() and immediately before iterating affected
PEs and devices. This ensures EEH-triggered PCI removes occur under proper
bus rescan locking without recursive lock contention.
The eeh_pe_loc_get() function has been split into two functions:
eeh_pe_loc_get(struct eeh_pe *pe) which retrieves the loc for given PE.
eeh_pe_loc_get_bus(struct pci_bus *bus) which retrieves the location
code for given bus.
This resolves lockdep warnings such as:
<snip>
[ 84.964298] [ T928] ============================================
[ 84.964304] [ T928] WARNING: possible recursive locking detected
[ 84.964311] [ T928] 6.18.0-rc3 #51 Not tainted
[ 84.964315] [ T928] --------------------------------------------
[ 84.964320] [ T928] eehd/928 is trying to acquire lock:
[ 84.964324] [ T928] c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40
[ 84.964342] [ T928]
but task is already holding lock:
[ 84.964347] [ T928] c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40
[ 84.964357] [ T928]
other info that might help us debug this:
[ 84.964363] [ T928] Possible unsafe locking scenario:
[ 84.964367] [ T928] CPU0
[ 84.964370] [ T928] ----
[ 84.964373] [ T928] lock(pci_rescan_remove_lock);
[ 84.964378] [ T928] lock(pci_rescan_remove_lock);
[ 84.964383] [ T928]
*** DEADLOCK ***
[ 84.964388] [ T928] May be due to missing lock nesting notation
[ 84.964393] [ T928] 1 lock held by eehd/928:
[ 84.964397] [ T928] #0: c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40
[ 84.964408] [ T928]
stack backtrace:
[ 84.964414] [ T928] CPU: 2 UID: 0 PID: 928 Comm: eehd Not tainted 6.18.0-rc3 #51 VOLUNTARY
[ 84.964417] [ T928] Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_022) hv:phyp pSeries
[ 84.964419] [ T928] Call Trace:
[ 84.964420] [ T928] [c0000011a7157990] [c000000001705de4] dump_stack_lvl+0xc8/0x130 (unreliable)
[ 84.964424] [ T928] [c0000011a71579d0] [c0000000002f66e0] print_deadlock_bug+0x430/0x440
[ 84.964428] [ T928] [c0000011a7157a70] [c0000000002fd0c0] __lock_acquire+0x1530/0x2d80
[ 84.964431] [ T928] [c0000011a7157ba0] [c0000000002fea54] lock_acquire+0x144/0x410
[ 84.964433] [ T928] [c0000011a7157cb0] [c0000011a7157cb0] __mutex_lock+0xf4/0x1050
[ 84.964436] [ T928] [c0000011a7157e00] [c000000000de21d8] pci_lock_rescan_remove+0x28/0x40
[ 84.964439] [ T928] [c0000011a7157e20] [c00000000004ed98] eeh_pe_bus_get+0x48/0xc0
[ 84.964442] [ T928] [c0000011a7157e50] [c000000000050434] eeh_handle_normal_event+0x64/0xa60
[ 84.964446] [ T928] [c0000011a7157f30] [c000000000051de8] eeh_event_handler+0xf8/0x190
[ 84.964450] [ T928] [c0000011a7157f90] [c0000000002747ac] kthread+0x16c/0x180
[ 84.964453] [ T928] [c0000011a7157fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18
</snip>
Fixes: 1010b4c012b0 ("powerpc/eeh: Make EEH driver device hotplug safe")
Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251210142559.8874-1-nnmlinux@linux.ibm.com
|
|
Masked user access avoids the address/size verification by access_ok().
Allthough its main purpose is to skip the speculation in the
verification of user address and size hence avoid the need of spec
mitigation, it also has the advantage of reducing the amount of
instructions required so it even benefits to platforms that don't
need speculation mitigation, especially when the size of the copy is
not know at build time.
So implement masked user access on powerpc. The only requirement is
to have memory gap that faults between the top user space and the
real start of kernel area.
On 64 bits platforms the address space is divided that way:
0xffffffffffffffff +------------------+
| |
| kernel space |
| |
0xc000000000000000 +------------------+ <== PAGE_OFFSET
|//////////////////|
|//////////////////|
0x8000000000000000 |//////////////////|
|//////////////////|
|//////////////////|
0x0010000000000000 +------------------+ <== TASK_SIZE_MAX
| |
| user space |
| |
0x0000000000000000 +------------------+
Kernel is always above 0x8000000000000000 and user always
below, with a gap in-between. It leads to a 3 instructions sequence:
150: 7c 69 fe 76 sradi r9,r3,63
154: 79 29 00 40 clrldi r9,r9,1
158: 7c 63 48 78 andc r3,r3,r9
This sequence leaves r3 unmodified when it is below 0x8000000000000000
and clamps it to 0x8000000000000000 if it is above.
On 32 bits it is more tricky. In theory user space can go up to
0xbfffffff while kernel will usually start at 0xc0000000. So a gap
needs to be added in-between. Allthough in theory a single 4k page
would suffice, it is easier and more efficient to enforce a 128k gap
below kernel, as it simplifies the masking.
e500 has the isel instruction which allows selecting one value or
the other without branch and that instruction is not speculative, so
use it. Allthough GCC usually generates code using that instruction,
it is safer to use inline assembly to be sure. The result is:
14: 3d 20 bf fe lis r9,-16386
18: 7c 03 48 40 cmplw r3,r9
1c: 7c 69 18 5e iselgt r3,r9,r3
On other ones, when kernel space is over 0x80000000 and user space
is below, the logic in mask_user_address_simple() leads to a
3 instruction sequence:
64: 7c 69 fe 70 srawi r9,r3,31
68: 55 29 00 7e clrlwi r9,r9,1
6c: 7c 63 48 78 andc r3,r3,r9
This is the default on powerpc 8xx.
When the limit between user space and kernel space is not 0x80000000,
mask_user_address_32() is used and a 6 instructions sequence is
generated:
24: 54 69 7c 7e srwi r9,r3,17
28: 21 29 57 ff subfic r9,r9,22527
2c: 7d 29 fe 70 srawi r9,r9,31
30: 75 2a b0 00 andis. r10,r9,45056
34: 7c 63 48 78 andc r3,r3,r9
38: 7c 63 53 78 or r3,r3,r10
The constraint is that TASK_SIZE be aligned to 128K in order to get
the most optimal number of instructions.
When CONFIG_PPC_BARRIER_NOSPEC is not defined, fallback on the
test-based masking as it is quicker than the 6 instructions sequence
but not quicker than the 3 instructions sequences above.
As an exemple, allthough barrier_nospec() voids on the 8xx, this
change has the following impact on strncpy_from_user(): the length of
the function is reduced from 488 to 340 bytes:
Start of the function with the patch:
00000000 <strncpy_from_user>:
0: 7c ab 2b 79 mr. r11,r5
4: 40 81 01 40 ble 144 <strncpy_from_user+0x144>
8: 7c 89 fe 70 srawi r9,r4,31
c: 55 29 00 7e clrlwi r9,r9,1
10: 7c 84 48 78 andc r4,r4,r9
14: 3d 20 dc 00 lis r9,-9216
18: 7d 3a c3 a6 mtspr 794,r9
1c: 2f 8b 00 03 cmpwi cr7,r11,3
20: 40 9d 00 b4 ble cr7,d4 <strncpy_from_user+0xd4>
...
Start of the function without the patch:
00000000 <strncpy_from_user>:
0: 7c a0 2b 79 mr. r0,r5
4: 40 81 01 10 ble 114 <strncpy_from_user+0x114>
8: 2f 84 00 00 cmpwi cr7,r4,0
c: 41 9c 01 30 blt cr7,13c <strncpy_from_user+0x13c>
10: 3d 20 80 00 lis r9,-32768
14: 7d 24 48 50 subf r9,r4,r9
18: 7f 80 48 40 cmplw cr7,r0,r9
1c: 7c 05 03 78 mr r5,r0
20: 41 9d 01 00 bgt cr7,120 <strncpy_from_user+0x120>
24: 3d 20 80 00 lis r9,-32768
28: 7d 25 48 50 subf r9,r5,r9
2c: 7f 84 48 40 cmplw cr7,r4,r9
30: 38 e0 ff f2 li r7,-14
34: 41 9d 00 e4 bgt cr7,118 <strncpy_from_user+0x118>
38: 94 21 ff e0 stwu r1,-32(r1)
3c: 3d 20 dc 00 lis r9,-9216
40: 7d 3a c3 a6 mtspr 794,r9
44: 2b 85 00 03 cmplwi cr7,r5,3
48: 40 9d 01 6c ble cr7,1b4 <strncpy_from_user+0x1b4>
...
118: 7c e3 3b 78 mr r3,r7
11c: 4e 80 00 20 blr
120: 7d 25 4b 78 mr r5,r9
124: 3d 20 80 00 lis r9,-32768
128: 7d 25 48 50 subf r9,r5,r9
12c: 7f 84 48 40 cmplw cr7,r4,r9
130: 38 e0 ff f2 li r7,-14
134: 41 bd ff e4 bgt cr7,118 <strncpy_from_user+0x118>
138: 4b ff ff 00 b 38 <strncpy_from_user+0x38>
13c: 38 e0 ff f2 li r7,-14
140: 4b ff ff d8 b 118 <strncpy_from_user+0x118>
...
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/8f418183d9125cc0bf23922bc2ef2a1130d8b63a.1766574657.git.chleroy@kernel.org
|
|
At the time being, TASK_SIZE can be customized by the user via Kconfig
but it is not possible to check all constraints in Kconfig. Impossible
setups are detected at compile time with BUILD_BUG() but that leads
to build failure when setting crazy values. It is not a problem on its
own because the user will usually either use the default value or set
a well thought value. However build robots generate crazy random
configs that lead to build failures, and build robots see it as a
regression every time a patch adds such a constraint.
So instead of failing the build when the custom TASK_SIZE is too
big, just adjust it to the maximum possible value matching the setup.
Several architectures already calculate TASK_SIZE based on other
parameters and options.
In order to do so, move MODULES_VADDR calculation into task_size_32.h
and ensure that:
- On book3s/32, userspace and module area have their own segments (256M)
- On 8xx, userspace has its own full PGDIR entries (4M)
Then TASK_SIZE is guaranteed to be correct so remove related
BUILD_BUG()s.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/6a2575420770d075cd090b5a316730a2ffafdee4.1766574657.git.chleroy@kernel.org
|
|
For book3s/32 it is assumed that TASK_SIZE is a multiple of 256 Mbytes,
but Kconfig allows any value for TASK_SIZE.
In all relevant calculations, align TASK_SIZE to the upper 256 Mbytes
boundary.
Also use ASM_CONST() in the definition of TASK_SIZE to ensure it is
seen as an unsigned constant.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/8928d906079e156c59794c41e826a684eaaaebb4.1766574657.git.chleroy@kernel.org
|
|
user_read_access_begin() and user_write_access_begin() and
user_access_begin() are now very similar. Create a common
__user_access_begin() that takes direction as parameter.
In order to avoid a warning with the conditional call of
barrier_nospec() which is sometimes an empty macro, change it to a
do {} while (0).
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/2b4f9d4e521e0b56bf5cb239916b4a178c4d2007.1766574657.git.chleroy@kernel.org
|
|
{allow/prevent}_{read/write/read_write}_{from/to/}_user()
The six following functions have become simple single-line fonctions
that do not have much added value anymore:
- allow_read_from_user()
- allow_write_to_user()
- allow_read_write_user()
- prevent_read_from_user()
- prevent_write_to_user()
- prevent_read_write_user()
Directly call allow_user_access() and prevent_user_access(), it doesn't
reduce the readability and it removes unnecessary middle functions.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/70971f0ba81eab742a120e5bfdeff6b42d08fd98.1766574657.git.chleroy@kernel.org
|
|
Since commit 16132529cee5 ("powerpc/32s: Rework Kernel Userspace
Access Protection") the size parameter is unused on all platforms.
And the 'from' parameter has never been used.
Remove them.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/4552b00707923b71150ee47b925d6eaae1b03261.1766574657.git.chleroy@kernel.org
|
|
Commit 74e19ef0ff80 ("uaccess: Add speculation barrier to
copy_from_user()") added a redundant barrier_nospec() in
copy_from_user(), because powerpc is already calling
barrier_nospec() in allow_read_from_user() and
allow_read_write_user(). But on other architectures that
call to barrier_nospec() was missing. So change powerpc
instead of reverting the above commit and having to fix
other architectures one by one. This is now possible
because barrier_nospec() has also been added in
copy_from_user_iter().
Move barrier_nospec() out of allow_read_from_user() and
allow_read_write_user(). This will also allow reuse of those
functions when implementing masked user access which doesn't
require barrier_nospec().
Don't add it back in raw_copy_from_user() as it is already called
by copy_from_user() and copy_from_user_iter().
Fixes: 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/f29612105c5fcbc8ceb7303808ddc1a781f0f6b5.1766574657.git.chleroy@kernel.org
|
|
Commit 2997876c4a1a ("powerpc/32: Restore clearing of MSR[RI] at
interrupt/syscall exit") delayed clearing of MSR[RI], but missed that
both MSR[RI] and MSR[EE] are cleared at the same time, so the commit
also delayed the disabling of interrupts, leading to unexpected
behaviour.
To fix that, mostly revert the blamed commit and restore the clearing
of MSR[RI] in interrupt_exit_kernel_prepare() instead. For 8xx it
implies adding a synchronising instruction after the mtspr in order to
make sure no instruction counter interrupt (used for perf events) will
fire just after clearing MSR[RI].
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Closes: https://lore.kernel.org/all/4d0bd05d-6158-1323-3509-744d3fbe8fc7@xenosoft.de/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/6b05eb1c-fdef-44e0-91a7-8286825e68f1@roeck-us.net/
Fixes: 2997876c4a1a ("powerpc/32: Restore clearing of MSR[RI] at interrupt/syscall exit")
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/585ea521b2be99d293b539bbfae148366cfb3687.1766146895.git.chleroy@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
fixes a build issue and does some cleanup in ib/sys_info.c
- "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
enhances the 64-bit math code on behalf of a PWM driver and beefs up
the test module for these library functions
- "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
makes BPF symbol names, sizes, and line numbers available to the GDB
debugger
- "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
adds a sysctl which can be used to cause additional info dumping when
the hung-task and lockup detectors fire
- "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
adds a general base64 encoder/decoder to lib/ and migrates several
users away from their private implementations
- "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
makes TCP a little faster
- "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
reworks the KEXEC Handover interfaces in preparation for Live Update
Orchestrator (LUO), and possibly for other future clients
- "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
increases the flexibility of KEXEC Handover. Also preparation for LUO
- "Live Update Orchestrator" (Pasha Tatashin)
is a major new feature targeted at cloud environments. Quoting the
cover letter:
This series introduces the Live Update Orchestrator, a kernel
subsystem designed to facilitate live kernel updates using a
kexec-based reboot. This capability is critical for cloud
environments, allowing hypervisors to be updated with minimal
downtime for running virtual machines. LUO achieves this by
preserving the state of selected resources, such as memory,
devices and their dependencies, across the kernel transition.
As a key feature, this series includes support for preserving
memfd file descriptors, which allows critical in-memory data, such
as guest RAM or any other large memory region, to be maintained in
RAM across the kexec reboot.
Mike Rappaport merits a mention here, for his extensive review and
testing work.
- "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
moves the kexec and kdump sysfs entries from /sys/kernel/ to
/sys/kernel/kexec/ and adds back-compatibility symlinks which can
hopefully be removed one day
- "kho: fixes for vmalloc restoration" (Mike Rapoport)
fixes a BUG which was being hit during KHO restoration of vmalloc()
regions
* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
calibrate: update header inclusion
Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
vmcoreinfo: track and log recoverable hardware errors
kho: fix restoring of contiguous ranges of order-0 pages
kho: kho_restore_vmalloc: fix initialization of pages array
MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
init: replace simple_strtoul with kstrtoul to improve lpj_setup
KHO: fix boot failure due to kmemleak access to non-PRESENT pages
Documentation/ABI: new kexec and kdump sysfs interface
Documentation/ABI: mark old kexec sysfs deprecated
kexec: move sysfs entries to /sys/kernel/kexec
test_kho: always print restore status
kho: free chunks using free_page() instead of kfree()
selftests/liveupdate: add kexec test for multiple and empty sessions
selftests/liveupdate: add simple kexec-based selftest for LUO
selftests/liveupdate: add userspace API selftests
docs: add documentation for memfd preservation via LUO
mm: memfd_luo: allow preserving memfd
liveupdate: luo_file: add private argument to store runtime state
mm: shmem: export some functions to internal.h
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping updates from Marek Szyprowski:
- More DMA mapping API refactoring to physical addresses as the primary
interface instead of page+offset parameters.
This time dma_map_ops callbacks are converted to physical addresses,
what in turn results also in some simplification of architecture
specific code (Leon Romanovsky and Jason Gunthorpe)
- Clarify that dma_map_benchmark is not a kernel self-test, but
standalone tool (Qinxin Xia)
* tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-mapping: remove unused map_page callback
xen: swiotlb: Convert mapping routine to rely on physical address
x86: Use physical address for DMA mapping
sparc: Use physical address DMA mapping
powerpc: Convert to physical address DMA mapping
parisc: Convert DMA map_page to map_phys interface
MIPS/jazzdma: Provide physical address directly
alpha: Convert mapping routine to rely on physical address
dma-mapping: remove unused mapping resource callbacks
xen: swiotlb: Switch to physical address mapping callbacks
ARM: dma-mapping: Switch to physical address mapping callbacks
ARM: dma-mapping: Reduce struct page exposure in arch_sync_dma*()
dma-mapping: convert dummy ops to physical address mapping
dma-mapping: prepare dma_map_ops to conversion to physical address
tools/dma: move dma_map_benchmark from selftests to tools/dma
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Restore clearing of MSR[RI] at interrupt/syscall exit on 32-bit
- Fix unpaired stwcx on interrupt exit on 32-bit
- Fix race condition leading to double list-add in
mac_hid_toggle_emumouse()
- Fix mprotect on book3s 32-bit
- Fix SLB multihit issue during SLB preload with 64-bit hash MMU
- Add support for crashkernel CMA reservation
- Add die_id and die_cpumask for Power10 & later to expose chip
hemispheres
- A series of minor fixes and improvements to the hash SLB code
Thanks to Antonio Alvarez Feijoo, Ben Collins, Bhaskar Chowdhury,
Christophe Leroy, Daniel Thompson, Dave Vasilevsky, Donet Tom,
J. Neuschäfer, Kunwu Chan, Long Li, Naresh Kamboju, Nathan Chancellor,
Ritesh Harjani (IBM), Shirisha G, Shrikanth Hegde, Sourabh Jain, Srikar
Dronamraju, Stephen Rothwell, Thomas Zimmermann, Venkat Rao Bagalkote,
and Vishal Chourasia.
* tag 'powerpc-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (32 commits)
macintosh/via-pmu-backlight: Include <linux/fb.h> and <linux/of.h>
powerpc/powermac: backlight: Include <linux/of.h>
powerpc/64s/slb: Add no_slb_preload early cmdline param
powerpc/64s/slb: Make preload_add return type as void
powerpc/ptdump: Dump PXX level info for kernel_page_tables
powerpc/64s/pgtable: Enable directMap counters in meminfo for Hash
powerpc/64s/hash: Update directMap page counters for Hash
powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU
powerpc/64s/hash: Improve hash mmu printk messages
powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize()
powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format
powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit
powerpc/64s/slb: Fix SLB multihit issue during SLB preload
powerpc, mm: Fix mprotect on book3s 32-bit
powerpc/smp: Expose die_id and die_cpumask
powerpc/83xx: Add a null pointer check to mcu_gpiochip_add
arch:powerpc:tools This file was missing shebang line, so added it
kexec: Include kernel-end even without crashkernel
powerpc: p2020: Rename wdt@ nodes to watchdog@
powerpc: 86xx: Rename wdt@ nodes to watchdog@
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Joerg Roedel:
- Introduction of the generic IO page-table framework with support for
Intel and AMD IOMMU formats from Jason.
This has good potential for unifying more IO page-table
implementations and making future enhancements more easy. But this
also needed quite some fixes during development. All known issues
have been fixed, but my feeling is that there is a higher potential
than usual that more might be needed.
- Intel VT-d updates:
- Use right invalidation hint in qi_desc_iotlb()
- Reduce the scope of INTEL_IOMMU_FLOPPY_WA
- ARM-SMMU updates:
- Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
and a new clock for the TBU.
- Fix error handling if level 1 CD table allocation fails.
- Permit more than the architectural maximum number of SMRs for
funky Qualcomm mis-implementations of SMMUv2.
- Mediatek driver:
- MT8189 iommu support
- Move ARM IO-pgtable selftests to kunit
- Device leak fixes for a couple of drivers
- Random smaller fixes and improvements
* tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (81 commits)
iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires
powerpc/pseries/svm: Make mem_encrypt.h self contained
genpt: Make GENERIC_PT invisible
iommupt: Avoid a compiler bug with sw_bit
iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal
iommupt: Fix unlikely flows in increase_top()
iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga()
MAINTAINERS: Update my email address
iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables
dt-bindings: iommu: qcom_iommu: Allow 'tbu' clock
iommu/vt-d: Restore previous domain::aperture_end calculation
iommu/vt-d: Fix unused invalidation hint in qi_desc_iotlb
iommu/vt-d: Set INTEL_IOMMU_FLOPPY_WA depend on BLK_DEV_FD
iommu/tegra: fix device leak on probe_device()
iommu/sun50i: fix device leak on of_xlate()
iommu/omap: simplify probe_device() error handling
iommu/omap: fix device leaks on probe_device()
iommu/mediatek-v1: add missing larb count sanity check
iommu/mediatek-v1: fix device leaks on probe()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scoped user access updates from Thomas Gleixner:
"Scoped user mode access and related changes:
- Implement the missing u64 user access function on ARM when
CONFIG_CPU_SPECTRE=n.
This makes it possible to access a 64bit value in generic code with
[unsafe_]get_user(). All other architectures and ARM variants
provide the relevant accessors already.
- Ensure that ASM GOTO jump label usage in the user mode access
helpers always goes through a local C scope label indirection
inside the helpers.
This is required because compilers are not supporting that a ASM
GOTO target leaves a auto cleanup scope. GCC silently fails to emit
the cleanup invocation and CLANG fails the build.
[ Editor's note: gcc-16 will have fixed the code generation issue
in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm
goto jumps [PR122835]"). But we obviously have to deal with clang
and older versions of gcc, so.. - Linus ]
This provides generic wrapper macros and the conversion of affected
architecture code to use them.
- Scoped user mode access with auto cleanup
Access to user mode memory can be required in hot code paths, but
if it has to be done with user controlled pointers, the access is
shielded with a speculation barrier, so that the CPU cannot
speculate around the address range check. Those speculation
barriers impact performance quite significantly.
This cost can be avoided by "masking" the provided pointer so it is
guaranteed to be in the valid user memory access range and
otherwise to point to a guaranteed unpopulated address space. This
has to be done without branches so it creates an address dependency
for the access, which the CPU cannot speculate ahead.
This results in repeating and error prone programming patterns:
if (can_do_masked_user_access())
from = masked_user_read_access_begin((from));
else if (!user_read_access_begin(from, sizeof(*from)))
return -EFAULT;
unsafe_get_user(val, from, Efault);
user_read_access_end();
return 0;
Efault:
user_read_access_end();
return -EFAULT;
which can be replaced with scopes and automatic cleanup:
scoped_user_read_access(from, Efault)
unsafe_get_user(val, from, Efault);
return 0;
Efault:
return -EFAULT;
- Convert code which implements the above pattern over to
scope_user.*.access(). This also corrects a couple of imbalanced
masked_*_begin() instances which are harmless on most
architectures, but prevent PowerPC from implementing the masking
optimization.
- Add a missing speculation barrier in copy_from_user_iter()"
* tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required
scm: Convert put_cmsg() to scoped user access
iov_iter: Add missing speculation barrier to copy_from_user_iter()
iov_iter: Convert copy_from_user_iter() to masked user access
select: Convert to scoped user access
x86/futex: Convert to scoped user access
futex: Convert to get/put_user_inline()
uaccess: Provide put/get_user_inline()
uaccess: Provide scoped user access regions
arm64: uaccess: Use unsafe wrappers for ASM GOTO
s390/uaccess: Use unsafe wrappers for ASM GOTO
riscv/uaccess: Use unsafe wrappers for ASM GOTO
powerpc/uaccess: Use unsafe wrappers for ASM GOTO
x86/uaccess: Use unsafe wrappers for ASM GOTO
uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user()
ARM: uaccess: Implement missing __get_user_asm_dword()
|
|
Add the missing forward declarations and includes so it does not have
implicit dependencies. mem_encrypt.h is a public header imported by
drivers. Users should not have to guess what include files are needed.
Resolves a kbuild splat:
In file included from drivers/iommu/generic_pt/fmt/iommu_amdv1.c:15:
In file included from drivers/iommu/generic_pt/fmt/iommu_template.h:36:
In file included from drivers/iommu/generic_pt/fmt/amdv1.h:23:
In file included from include/linux/mem_encrypt.h:17:
>> arch/powerpc/include/asm/mem_encrypt.h:13:49: warning: declaration of 'struct device' will not be visible outside of this function [-Wvisibility]
13 | static inline bool force_dma_unencrypted(struct device *dev)
Fixes: 879ced2bab1b ("iommupt: Add the AMD IOMMU v1 page table format")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511161358.rS5pSb3U-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Bring in the UDB and objtool data annotations to avoid conflicts while further extending the bug exceptions.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
On systems using the hash MMU, there is a software SLB preload cache that
mirrors the entries loaded into the hardware SLB buffer. This preload
cache is subject to periodic eviction — typically after every 256 context
switches — to remove old entry.
To optimize performance, the kernel skips switch_mmu_context() in
switch_mm_irqs_off() when the prev and next mm_struct are the same.
However, on hash MMU systems, this can lead to inconsistencies between
the hardware SLB and the software preload cache.
If an SLB entry for a process is evicted from the software cache on one
CPU, and the same process later runs on another CPU without executing
switch_mmu_context(), the hardware SLB may retain stale entries. If the
kernel then attempts to reload that entry, it can trigger an SLB
multi-hit error.
The following timeline shows how stale SLB entries are created and can
cause a multi-hit error when a process moves between CPUs without a
MMU context switch.
CPU 0 CPU 1
----- -----
Process P
exec swapper/1
load_elf_binary
begin_new_exc
activate_mm
switch_mm_irqs_off
switch_mmu_context
switch_slb
/*
* This invalidates all
* the entries in the HW
* and setup the new HW
* SLB entries as per the
* preload cache.
*/
context_switch
sched_migrate_task migrates process P to cpu-1
Process swapper/0 context switch (to process P)
(uses mm_struct of Process P) switch_mm_irqs_off()
switch_slb
load_slb++
/*
* load_slb becomes 0 here
* and we evict an entry from
* the preload cache with
* preload_age(). We still
* keep HW SLB and preload
* cache in sync, that is
* because all HW SLB entries
* anyways gets evicted in
* switch_slb during SLBIA.
* We then only add those
* entries back in HW SLB,
* which are currently
* present in preload_cache
* (after eviction).
*/
load_elf_binary continues...
setup_new_exec()
slb_setup_new_exec()
sched_switch event
sched_migrate_task migrates
process P to cpu-0
context_switch from swapper/0 to Process P
switch_mm_irqs_off()
/*
* Since both prev and next mm struct are same we don't call
* switch_mmu_context(). This will cause the HW SLB and SW preload
* cache to go out of sync in preload_new_slb_context. Because there
* was an SLB entry which was evicted from both HW and preload cache
* on cpu-1. Now later in preload_new_slb_context(), when we will try
* to add the same preload entry again, we will add this to the SW
* preload cache and then will add it to the HW SLB. Since on cpu-0
* this entry was never invalidated, hence adding this entry to the HW
* SLB will cause a SLB multi-hit error.
*/
load_elf_binary continues...
START_THREAD
start_thread
preload_new_slb_context
/*
* This tries to add a new EA to preload cache which was earlier
* evicted from both cpu-1 HW SLB and preload cache. This caused the
* HW SLB of cpu-0 to go out of sync with the SW preload cache. The
* reason for this was, that when we context switched back on CPU-0,
* we should have ideally called switch_mmu_context() which will
* bring the HW SLB entries on CPU-0 in sync with SW preload cache
* entries by setting up the mmu context properly. But we didn't do
* that since the prev mm_struct running on cpu-0 was same as the
* next mm_struct (which is true for swapper / kernel threads). So
* now when we try to add this new entry into the HW SLB of cpu-0,
* we hit a SLB multi-hit error.
*/
WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62
assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149]
Modules linked in:
CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12
VOLUNTARY
Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected)
0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000
REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty)
MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000
CFAR: c0000000001543b0 IRQMASK: 3
<...>
NIP [c00000000015426c] assert_slb_presence+0x2c/0x50
LR [c0000000001543b4] slb_insert_entry+0x124/0x390
Call Trace:
0x7fffceb5ffff (unreliable)
preload_new_slb_context+0x100/0x1a0
start_thread+0x26c/0x420
load_elf_binary+0x1b04/0x1c40
bprm_execve+0x358/0x680
do_execveat_common+0x1f8/0x240
sys_execve+0x58/0x70
system_call_exception+0x114/0x300
system_call_common+0x160/0x2c4
>From the above analysis, during early exec the hardware SLB is cleared,
and entries from the software preload cache are reloaded into hardware
by switch_slb. However, preload_new_slb_context and slb_setup_new_exec
also attempt to load some of the same entries, which can trigger a
multi-hit. In most cases, these additional preloads simply hit existing
entries and add nothing new. Removing these functions avoids redundant
preloads and eliminates the multi-hit issue. This patch removes these
two functions.
We tested process switching performance using the context_switch
benchmark on POWER9/hash, and observed no regression.
Without this patch: 129041 ops/sec
With this patch: 129341 ops/sec
We also measured SLB faults during boot, and the counts are essentially
the same with and without this patch.
SLB faults without this patch: 19727
SLB faults with this patch: 19786
Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
cc: stable@vger.kernel.org
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/0ac694ae683494fe8cadbd911a1a5018d5d3c541.1761834163.git.ritesh.list@gmail.com
|
|
On 32-bit book3s with hash-MMUs, tlb_flush() was a no-op. This was
unnoticed because all uses until recently were for unmaps, and thus
handled by __tlb_remove_tlb_entry().
After commit 4a18419f71cd ("mm/mprotect: use mmu_gather") in kernel 5.19,
tlb_gather_mmu() started being used for mprotect as well. This caused
mprotect to simply not work on these machines:
int *ptr = mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
*ptr = 1; // force HPTE to be created
mprotect(ptr, 4096, PROT_READ);
*ptr = 2; // should segfault, but succeeds
Fixed by making tlb_flush() actually flush TLB pages. This finally
agrees with the behaviour of boot3s64's tlb_flush().
Fixes: 4a18419f71cd ("mm/mprotect: use mmu_gather")
Cc: stable@vger.kernel.org
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Dave Vasilevsky <dave@vasilevsky.ca>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251116-vasi-mprotect-g3-v3-1-59a9bd33ba00@vasilevsky.ca
|
|
>From Power10 processors onwards, each chip has 2 hemispheres. For LPARs
running on PowerVM Hypervisor, hypervisor determines the allocation of
CPU groups to each LPAR, resulting in two LPARs with the same number of
CPUs potentially having different numbers of CPUs from each hemisphere.
Additionally, it is not feasible to ascertain the hemisphere based
solely on the CPU number.
Users wishing to assign their workload to all CPUs, or a subset of CPUs
within a specific hemisphere, encounter difficulties in identifying the
cpumask. To address this, it is proposed to expose hemisphere
information as a die in sysfs. This aligns with other architectures
and facilitates the identification of CPUs within the same hemisphere.
Tools such as lstopo can also access this information.
Please note: The hypervisor reveals the locality of the CPUs to
hemispheres only in dedicated mode. Consequently, in systems where
hemisphere information is unavailable, such as shared LPARs, the
die_cpus information in sysfs will mirror package_cpus, with
die_id set to -1.
Without this change.
$ grep . /sys/devices/system/cpu/cpu16/topology/{die*,package*} 2>/dev/null
/sys/devices/system/cpu/cpu16/topology/package_cpus:000000,000000ff,ffff0000
/sys/devices/system/cpu/cpu16/topology/package_cpus_list:16-39
With this change.
$ grep . /sys/devices/system/cpu/cpu16/topology/{die*,package*} 2>/dev/null
/sys/devices/system/cpu/cpu16/topology/die_cpus:000000,00000000,00ff0000
/sys/devices/system/cpu/cpu16/topology/die_cpus_list:16-23
/sys/devices/system/cpu/cpu16/topology/die_id:2
/sys/devices/system/cpu/cpu16/topology/package_cpus:000000,000000ff,ffff0000
/sys/devices/system/cpu/cpu16/topology/package_cpus_list:16-39
snipped lstopo-no-graphics o/p
Group0 L#0 (total=8747584KB)
Package L#0 (total=3564096KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)")
NUMANode L#0 (P#0 local=3564096KB total=3564096KB)
Die L#0 (P#0)
Core L#0 (P#0)
<snipped>
Package L#1 (total=5183488KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)")
NUMANode L#1 (P#1 local=5183488KB total=5183488KB)
Die L#2 (P#2)
Core L#2 (P#16)
L3Cache L#4 (size=4096KB linesize=128 ways=16)
L2Cache L#4 (size=1024KB linesize=128 ways=8)
L1dCache L#4 (size=32KB linesize=128 ways=8)
L1iCache L#4 (size=48KB linesize=128 ways=6)
PU L#16 (P#16)
PU L#17 (P#18)
PU L#18 (P#20)
PU L#19 (P#22)
L3Cache L#5 (size=4096KB linesize=128 ways=16)
L2Cache L#5 (size=1024KB linesize=128 ways=8)
L1dCache L#5 (size=32KB linesize=128 ways=8)
L1iCache L#5 (size=48KB linesize=128 ways=6)
PU L#20 (P#17)
PU L#21 (P#19)
PU L#22 (P#21)
PU L#23 (P#23)
Die L#3 (P#3)
Core L#3 (P#24)
L3Cache L#6 (size=4096KB linesize=128 ways=16)
L2Cache L#6 (size=1024KB linesize=128 ways=8)
L1dCache L#6 (size=32KB linesize=128 ways=8)
L1iCache L#6 (size=48KB linesize=128 ways=6)
PU L#24 (P#24)
PU L#25 (P#26)
PU L#26 (P#28)
PU L#27 (P#30)
L3Cache L#7 (size=4096KB linesize=128 ways=16)
L2Cache L#7 (size=1024KB linesize=128 ways=8)
L1dCache L#7 (size=32KB linesize=128 ways=8)
L1iCache L#7 (size=48KB linesize=128 ways=6)
PU L#28 (P#25)
PU L#29 (P#27)
PU L#30 (P#29)
PU L#31 (P#31)
Core L#4 (P#32)
L3Cache L#8 (size=4096KB linesize=128 ways=16)
L2Cache L#8 (size=1024KB linesize=128 ways=8)
L1dCache L#8 (size=32KB linesize=128 ways=8)
L1iCache L#8 (size=48KB linesize=128 ways=6)
PU L#32 (P#32)
PU L#33 (P#34)
PU L#34 (P#36)
PU L#35 (P#38)
L3Cache L#9 (size=4096KB linesize=128 ways=16)
L2Cache L#9 (size=1024KB linesize=128 ways=8)
L1dCache L#9 (size=32KB linesize=128 ways=8)
L1iCache L#9 (size=48KB linesize=128 ways=6)
PU L#36 (P#33)
PU L#37 (P#35)
PU L#38 (P#37)
PU L#39 (P#39)
Group0 L#1 (total=7736896KB)
Package L#2 (total=5170880KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)")
NUMANode L#2 (P#2 local=5170880KB total=5170880KB)
Die L#4 (P#4)
<snipped>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251112074859.814087-1-srikar@linux.ibm.com
|
|
With the generic crashkernel reservation, the kernel emits the following
warning on powerpc:
WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/mem.c:341 add_system_ram_resources+0xfc/0x180
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-auto-12607-g5472d60c129f #1 VOLUNTARY
Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
NIP: c00000000201de3c LR: c00000000201de34 CTR: 0000000000000000
REGS: c000000127cef8a0 TRAP: 0700 Not tainted (6.17.0-auto-12607-g5472d60c129f)
MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84000840 XER: 20040010
CFAR: c00000000017eed0 IRQMASK: 0
GPR00: c00000000201de34 c000000127cefb40 c0000000016a8100 0000000000000001
GPR04: c00000012005aa00 0000000020000000 c000000002b705c8 0000000000000000
GPR08: 000000007fffffff fffffffffffffff0 c000000002db8100 000000011fffffff
GPR12: c00000000201dd40 c000000002ff0000 c0000000000112bc 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 c0000000015a3808
GPR24: c00000000200468c c000000001699888 0000000000000106 c0000000020d1950
GPR28: c0000000014683f8 0000000081000200 c0000000015c1868 c000000002b9f710
NIP [c00000000201de3c] add_system_ram_resources+0xfc/0x180
LR [c00000000201de34] add_system_ram_resources+0xf4/0x180
Call Trace:
add_system_ram_resources+0xf4/0x180 (unreliable)
do_one_initcall+0x60/0x36c
do_initcalls+0x120/0x220
kernel_init_freeable+0x23c/0x390
kernel_init+0x34/0x26c
ret_from_kernel_user_thread+0x14/0x1c
This warning occurs due to a conflict between crashkernel and System RAM
iomem resources.
The generic crashkernel reservation adds the crashkernel memory range to
/proc/iomem during early initialization. Later, all memblock ranges are
added to /proc/iomem as System RAM. If the crashkernel region overlaps
with any memblock range, it causes a conflict while adding those memblock
regions as iomem resources, triggering the above warning. The conflicting
memblock regions are then omitted from /proc/iomem.
For example, if the following crashkernel region is added to /proc/iomem:
20000000-11fffffff : Crash kernel
then the following memblock regions System RAM regions fail to be inserted:
00000000-7fffffff : System RAM
80000000-257fffffff : System RAM
Fix this by not adding the crashkernel memory to /proc/iomem on powerpc.
Introduce an architecture hook to let each architecture decide whether to
export the crashkernel region to /proc/iomem.
For more info checkout commit c40dd2f766440 ("powerpc: Add System RAM
to /proc/iomem") and commit bce074bdbc36 ("powerpc: insert System RAM
resource to prevent crashkernel conflict")
Note: Before switching to the generic crashkernel reservation, powerpc
never exported the crashkernel region to /proc/iomem.
Link: https://lkml.kernel.org/r/20251016142831.144515-1-sourabhjain@linux.ibm.com
Fixes: e3185ee438c2 ("powerpc/crash: use generic crashkernel reservation").
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/90937fe0-2e76-4c82-b27e-7b8a7fe3ac69@linux.ibm.com/
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: Baoquan he <bhe@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the
crashkernel= command line option") and commit ab475510e042 ("kdump:
implement reserve_crashkernel_cma") added CMA support for kdump
crashkernel reservation.
Extend crashkernel CMA reservation support to powerpc.
The following changes are made to enable CMA reservation on powerpc:
- Parse and obtain the CMA reservation size along with other crashkernel
parameters
- Call reserve_crashkernel_cma() to allocate the CMA region for kdump
- Include the CMA-reserved ranges in the usable memory ranges for the
kdump kernel to use.
- Exclude the CMA-reserved ranges from the crash kernel memory to
prevent them from being exported through /proc/vmcore.
With the introduction of the CMA crashkernel regions,
crash_exclude_mem_range() needs to be called multiple times to exclude
both crashk_res and crashk_cma_ranges from the crash memory ranges. To
avoid repetitive logic for validating mem_ranges size and handling
reallocation when required, this functionality is moved to a new wrapper
function crash_exclude_mem_range_guarded().
To ensure proper CMA reservation, reserve_crashkernel_cma() is called
after pageblock_order is initialized.
Update kernel-parameters.txt to document CMA support for crashkernel on
powerpc architecture.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251107080334.708028-1-sourabhjain@linux.ibm.com
|
|
ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope:
bool foo(u32 __user *p, u32 val)
{
scoped_guard(pagefault)
unsafe_put_user(val, p, efault);
return true;
efault:
return false;
}
It ends up leaking the pagefault disable counter in the fault path. clang
at least fails the build.
Rename unsafe_*_user() to arch_unsafe_*_user() which makes the generic
uaccess header wrap it with a local label that makes both compilers emit
correct code. Same for the kernel_nofault() variants.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027083745.356628509@linutronix.de
|
|
Adapt PowerPC DMA to use physical addresses in order to prepare code
to removal .map_page and .unmap_page.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20251015-remove-map-page-v5-10-3bbfe3a25cdf@kernel.org
|
|
Pull x86 kvm updates from Paolo Bonzini:
"Generic:
- Rework almost all of KVM's exports to expose symbols only to KVM's
x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko
x86:
- Rework almost all of KVM x86's exports to expose symbols only to
KVM's vendor modules, i.e. to kvm-{amd,intel}.ko
- Add support for virtualizing Control-flow Enforcement Technology
(CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD
(Shadow Stacks).
It is worth noting that while SHSTK and IBT can be enabled
separately in CPUID, it is not really possible to virtualize them
separately. Therefore, Intel processors will really allow both
SHSTK and IBT under the hood if either is made visible in the
guest's CPUID. The alternative would be to intercept
XSAVES/XRSTORS, which is not feasible for performance reasons
- Fix a variety of fuzzing WARNs all caused by checking L1 intercepts
when completing userspace I/O. KVM has already committed to
allowing L2 to to perform I/O at that point
- Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the
MSR is supposed to exist for v2 PMUs
- Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs
- Add support for the immediate forms of RDMSR and WRMSRNS, sans full
emulator support (KVM should never need to emulate the MSRs outside
of forced emulation and other contrived testing scenarios)
- Clean up the MSR APIs in preparation for CET and FRED
virtualization, as well as mediated vPMU support
- Clean up a pile of PMU code in anticipation of adding support for
mediated vPMUs
- Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI
vmexits needed to faithfully emulate an I/O APIC for such guests
- Many cleanups and minor fixes
- Recover possible NX huge pages within the TDP MMU under read lock
to reduce guest jitter when restoring NX huge pages
- Return -EAGAIN during prefault if userspace concurrently
deletes/moves the relevant memslot, to fix an issue where
prefaulting could deadlock with the memslot update
x86 (AMD):
- Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is
supported
- Require a minimum GHCB version of 2 when starting SEV-SNP guests
via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate
errors instead of latent guest failures
- Add support for SEV-SNP's CipherText Hiding, an opt-in feature that
prevents unauthorized CPU accesses from reading the ciphertext of
SNP guest private memory, e.g. to attempt an offline attack. This
feature splits the shared SEV-ES/SEV-SNP ASID space into separate
ranges for SEV-ES and SEV-SNP guests, therefore a new module
parameter is needed to control the number of ASIDs that can be used
for VMs with CipherText Hiding vs. how many can be used to run
SEV-ES guests
- Add support for Secure TSC for SEV-SNP guests, which prevents the
untrusted host from tampering with the guest's TSC frequency, while
still allowing the the VMM to configure the guest's TSC frequency
prior to launch
- Validate the XCR0 provided by the guest (via the GHCB) to avoid
bugs resulting from bogus XCR0 values
- Save an SEV guest's policy if and only if LAUNCH_START fully
succeeds to avoid leaving behind stale state (thankfully not
consumed in KVM)
- Explicitly reject non-positive effective lengths during SNP's
LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with
them
- Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the
host's desired TSC_AUX, to fix a bug where KVM was keeping a
different vCPU's TSC_AUX in the host MSR until return to userspace
KVM (Intel):
- Preparation for FRED support
- Don't retry in TDX's anti-zero-step mitigation if the target
memslot is invalid, i.e. is being deleted or moved, to fix a
deadlock scenario similar to the aforementioned prefaulting case
- Misc bugfixes and minor cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
KVM: x86: Export KVM-internal symbols for sub-modules only
KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks
KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c
KVM: Export KVM-internal symbols for sub-modules only
KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent
KVM: VMX: Make CR4.CET a guest owned bit
KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
KVM: selftests: Add coverage for KVM-defined registers in MSRs test
KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
KVM: selftests: Extend MSRs test to validate vCPUs without supported features
KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
KVM: selftests: Add an MSR test to exercise guest/host and read/write
KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
KVM: x86: Define Control Protection Exception (#CP) vector
KVM: x86: Add human friendly formatting for #XM, and #VE
KVM: SVM: Enable shadow stack virtualization for SVM
KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
KVM: SVM: Pass through shadow stack MSRs as appropriate
KVM: SVM: Update dump_vmcb with shadow stack save area additions
KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "mm, swap: improve cluster scan strategy" from Kairui Song improves
performance and reduces the failure rate of swap cluster allocation
- "support large align and nid in Rust allocators" from Vitaly Wool
permits Rust allocators to set NUMA node and large alignment when
perforning slub and vmalloc reallocs
- "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend
DAMOS_STAT's handling of the DAMON operations sets for virtual
address spaces for ops-level DAMOS filters
- "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren
Baghdasaryan reduces mmap_lock contention during reads of
/proc/pid/maps
- "mm/mincore: minor clean up for swap cache checking" from Kairui Song
performs some cleanup in the swap code
- "mm: vm_normal_page*() improvements" from David Hildenbrand provides
code cleanup in the pagemap code
- "add persistent huge zero folio support" from Pankaj Raghav provides
a block layer speedup by optionalls making the
huge_zero_pagepersistent, instead of releasing it when its refcount
falls to zero
- "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to
the recently added Kexec Handover feature
- "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo
Stoakes turns mm_struct.flags into a bitmap. To end the constant
struggle with space shortage on 32-bit conflicting with 64-bit's
needs
- "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap
code
- "selftests/mm: Fix false positives and skip unsupported tests" from
Donet Tom fixes a few things in our selftests code
- "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised"
from David Hildenbrand "allows individual processes to opt-out of
THP=always into THP=madvise, without affecting other workloads on the
system".
It's a long story - the [1/N] changelog spells out the considerations
- "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on
the memdesc project. Please see
https://kernelnewbies.org/MatthewWilcox/Memdescs and
https://blogs.oracle.com/linux/post/introducing-memdesc
- "Tiny optimization for large read operations" from Chi Zhiling
improves the efficiency of the pagecache read path
- "Better split_huge_page_test result check" from Zi Yan improves our
folio splitting selftest code
- "test that rmap behaves as expected" from Wei Yang adds some rmap
selftests
- "remove write_cache_pages()" from Christoph Hellwig removes that
function and converts its two remaining callers
- "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD
selftests issues
- "introduce kernel file mapped folios" from Boris Burkov introduces
the concept of "kernel file pages". Using these permits btrfs to
account its metadata pages to the root cgroup, rather than to the
cgroups of random inappropriate tasks
- "mm/pageblock: improve readability of some pageblock handling" from
Wei Yang provides some readability improvements to the page allocator
code
- "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON
to understand arm32 highmem
- "tools: testing: Use existing atomic.h for vma/maple tests" from
Brendan Jackman performs some code cleanups and deduplication under
tools/testing/
- "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes
a couple of 32-bit issues in tools/testing/radix-tree.c
- "kasan: unify kasan_enabled() and remove arch-specific
implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific
initialization code into a common arch-neutral implementation
- "mm: remove zpool" from Johannes Weiner removes zspool - an
indirection layer which now only redirects to a single thing
(zsmalloc)
- "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a
couple of cleanups in the fork code
- "mm: remove nth_page()" from David Hildenbrand makes rather a lot of
adjustments at various nth_page() callsites, eventually permitting
the removal of that undesirable helper function
- "introduce kasan.write_only option in hw-tags" from Yeoreum Yun
creates a KASAN read-only mode for ARM, using that architecture's
memory tagging feature. It is felt that a read-only mode KASAN is
suitable for use in production systems rather than debug-only
- "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does
some tidying in the hugetlb folio allocation code
- "mm: establish const-correctness for pointer parameters" from Max
Kellermann makes quite a number of the MM API functions more accurate
about the constness of their arguments. This was getting in the way
of subsystems (in this case CEPH) when they attempt to improving
their own const/non-const accuracy
- "Cleanup free_pages() misuse" from Vishal Moola fixes a number of
code sites which were confused over when to use free_pages() vs
__free_pages()
- "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the
mapletree code accessible to Rust. Required by nouveau and by its
forthcoming successor: the new Rust Nova driver
- "selftests/mm: split_huge_page_test: split_pte_mapped_thp
improvements" from David Hildenbrand adds a fix and some cleanups to
the thp selftesting code
- "mm, swap: introduce swap table as swap cache (phase I)" from Chris
Li and Kairui Song is the first step along the path to implementing
"swap tables" - a new approach to swap allocation and state tracking
which is expected to yield speed and space improvements. This
patchset itself yields a 5-20% performance benefit in some situations
- "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc
layer to clean up the ptdesc code a little
- "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some
issues in our 5-level pagetable selftesting code
- "Minor fixes for memory allocation profiling" from Suren Baghdasaryan
addresses a couple of minor issues in relatively new memory
allocation profiling feature
- "Small cleanups" from Matthew Wilcox has a few cleanups in
preparation for more memdesc work
- "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from
Quanmin Yan makes some changes to DAMON in furtherance of supporting
arm highmem
- "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad
Anjum adds that compiler check to selftests code and fixes the
fallout, by removing dead code
- "Improvements to Victim Process Thawing and OOM Reaper Traversal
Order" from zhongjinji makes a number of improvements in the OOM
killer: mainly thawing a more appropriate group of victim threads so
they can release resources
- "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park
is a bunch of small and unrelated fixups for DAMON
- "mm/damon: define and use DAMON initialization check function" from
SeongJae Park implement reliability and maintainability improvements
to a recently-added bug fix
- "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from
SeongJae Park provides additional transparency to userspace clients
of the DAMON_STAT information
- "Expand scope of khugepaged anonymous collapse" from Dev Jain removes
some constraints on khubepaged's collapsing of anon VMAs. It also
increases the success rate of MADV_COLLAPSE against an anon vma
- "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()"
from Lorenzo Stoakes moves us further towards removal of
file_operations.mmap(). This patchset concentrates upon clearing up
the treatment of stacked filesystems
- "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau
provides some fixes and improvements to mlock's tracking of large
folios. /proc/meminfo's "Mlocked" field became more accurate
- "mm/ksm: Fix incorrect accounting of KSM counters during fork" from
Donet Tom fixes several user-visible KSM stats inaccuracies across
forks and adds selftest code to verify these counters
- "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses
some potential but presently benign issues in KSM's mm_slot handling
* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits)
mm: swap: check for stable address space before operating on the VMA
mm: convert folio_page() back to a macro
mm/khugepaged: use start_addr/addr for improved readability
hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
alloc_tag: fix boot failure due to NULL pointer dereference
mm: silence data-race in update_hiwater_rss
mm/memory-failure: don't select MEMORY_ISOLATION
mm/khugepaged: remove definition of struct khugepaged_mm_slot
mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
hugetlb: increase number of reserving hugepages via cmdline
selftests/mm: add fork inheritance test for ksm_merging_pages counter
mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
drivers/base/node: fix double free in register_one_node()
mm: remove PMD alignment constraint in execmem_vmalloc()
mm/memory_hotplug: fix typo 'esecially' -> 'especially'
mm/rmap: improve mlock tracking for large folios
mm/filemap: map entire large folio faultaround
mm/fault: try to map the entire file folio in finish_fault()
mm/rmap: mlock large folios in try_to_unmap_one()
mm/rmap: fix a mlock race condition in folio_referenced_one()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- FC target fixes (Daniel)
- Authentication fixes and updates (Martin, Chris)
- Admin controller handling (Kamaljit)
- Target lockdep assertions (Max)
- Keep-alive updates for discovery (Alastair)
- Suspend quirk (Georg)
- MD pull request via Yu:
- Add support for a lockless bitmap.
A key feature for the new bitmap are that the IO fastpath is
lockless. If a user issues lots of write IO to the same bitmap
bit in a short time, only the first write has additional overhead
to update bitmap bit, no additional overhead for the following
writes.
By supporting only resync or recover written data, means in the
case creating new array or replacing with a new disk, there is no
need to do a full disk resync/recovery.
- Switch ->getgeo() and ->bios_param() to using struct gendisk rather
than struct block_device.
- Rust block changes via Andreas. This series adds configuration via
configfs and remote completion to the rnull driver. The series also
includes a set of changes to the rust block device driver API: a few
cleanup patches, and a few features supporting the rnull changes.
The series removes the raw buffer formatting logic from
`kernel::block` and improves the logic available in `kernel::string`
to support the same use as the removed logic.
- floppy arch cleanups
- Reduce the number of dereferencing needed for ublk commands
- Restrict supported sockets for nbd. Mostly done to eliminate a class
of issues perpetually reported by syzbot, by using nonsensical socket
setups.
- A few s390 dasd block fixes
- Fix a few issues around atomic writes
- Improve DMA interation for integrity requests
- Improve how iovecs are treated with regards to O_DIRECT aligment
constraints.
We used to require each segment to adhere to the constraints, now
only the request as a whole needs to.
- Clean up and improve p2p support, enabling use of p2p for metadata
payloads
- Improve locking of request lookup, using SRCU where appropriate
- Use page references properly for brd, avoiding very long RCU sections
- Fix ordering of recursively submitted IOs
- Clean up and improve updating nr_requests for a live device
- Various fixes and cleanups
* tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits)
s390/dasd: enforce dma_alignment to ensure proper buffer validation
s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request
ublk: remove redundant zone op check in ublk_setup_iod()
nvme: Use non zero KATO for persistent discovery connections
nvmet: add safety check for subsys lock
nvme-core: use nvme_is_io_ctrl() for I/O controller check
nvme-core: do ioccsz/iorcsz validation only for I/O controllers
nvme-core: add method to check for an I/O controller
blk-cgroup: fix possible deadlock while configuring policy
blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path
blk-mq: Fix more tag iteration function documentation
selftests: ublk: fix behavior when fio is not installed
ublk: don't access ublk_queue in ublk_unmap_io()
ublk: pass ublk_io to __ublk_complete_rq()
ublk: don't access ublk_queue in ublk_need_complete_req()
ublk: don't access ublk_queue in ublk_check_commit_and_fetch()
ublk: don't pass ublk_queue to ublk_fetch()
ublk: don't access ublk_queue in ublk_config_io_buf()
ublk: don't access ublk_queue in ublk_check_fetch_buf()
ublk: pass q_id and tag to __ublk_check_and_get_req()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nathan Chancellor:
- Extend modules.builtin.modinfo to include module aliases from
MODULE_DEVICE_TABLE for builtin modules so that userspace tools (such
as kmod) can verify that a particular module alias will be handled by
a builtin module
- Bump the minimum version of LLVM for building the kernel to 15.0.0
- Upgrade several userspace API checks in headers_check.pl to errors
- Unify and consolidate CONFIG_WERROR / W=e handling
- Turn assembler and linker warnings into errors with CONFIG_WERROR /
W=e
- Respect CONFIG_WERROR / W=e when building userspace programs
(userprogs)
- Enable -Werror unconditionally when building host programs
(hostprogs)
- Support copy_file_range() and data segment alignment in gen_init_cpio
to improve performance on filesystems that support reflinks such as
btrfs and XFS
- Miscellaneous small changes to scripts and configuration files
* tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (47 commits)
modpost: Initialize builtin_modname to stop SIGSEGVs
Documentation: kbuild: note CONFIG_DEBUG_EFI in reproducible builds
kbuild: vmlinux.unstripped should always depend on .vmlinux.export.o
modpost: Create modalias for builtin modules
modpost: Add modname to mod_device_table alias
scsi: Always define blogic_pci_tbl structure
kbuild: extract modules.builtin.modinfo from vmlinux.unstripped
kbuild: keep .modinfo section in vmlinux.unstripped
kbuild: always create intermediate vmlinux.unstripped
s390: vmlinux.lds.S: Reorder sections
KMSAN: Remove tautological checks
objtool: Drop noinstr hack for KCSAN_WEAK_MEMORY
lib/Kconfig.debug: Drop CLANG_VERSION check from DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
riscv: Remove ld.lld version checks from many TOOLCHAIN_HAS configs
riscv: Unconditionally use linker relaxation
riscv: Remove version check for LTO_CLANG selects
powerpc: Drop unnecessary initializations in __copy_inst_from_kernel_nofault()
mips: Unconditionally select ARCH_HAS_CURRENT_STACK_POINTER
arm64: Remove tautological LLVM Kconfig conditions
ARM: Clean up definition of ARM_HAS_GROUP_RELOCS
...
|
|
Rework the vast majority of KVM's exports to expose symbols only to KVM
submodules, i.e. to x86's kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko.
With few exceptions, KVM's exported APIs are intended (and safe) for KVM-
internal usage only.
Keep kvm_get_kvm(), kvm_get_kvm_safe(), and kvm_put_kvm() as normal
exports, as they are needed by VFIO, and are generally safe for external
usage (though ideally even the get/put APIs would be KVM-internal, and
VFIO would pin a VM by grabbing a reference to its associated file).
Implement a framework in kvm_types.h in anticipation of providing a macro
to restrict KVM-specific kernel exports, i.e. to provide symbol exports
for KVM if and only if KVM is built as one or more modules.
Link: https://lore.kernel.org/r/20250919003303.1355064-3-seanjc@google.com
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core scheduler changes:
- Make migrate_{en,dis}able() inline, to improve performance
(Menglong Dong)
- Move STDL_INIT() functions out-of-line (Peter Zijlstra)
- Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra)
Fair scheduling:
- Defer throttling to when tasks exit to user-space, to reduce the
chance & impact of throttle-preemption with held locks and other
resources (Aaron Lu, Valentin Schneider)
- Get rid of sched_domains_curr_level hack for tl->cpumask(), as the
warning was getting triggered on certain topologies (Peter
Zijlstra)
Misc cleanups & fixes:
- Header cleanups (Menglong Dong)
- Fix race in push_dl_task() (Harshit Agarwal)"
* tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix some typos in include/linux/preempt.h
sched: Make migrate_{en,dis}able() inline
rcu: Replace preempt.h with sched.h in include/linux/rcupdate.h
arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c
sched/fair: Do not balance task to a throttled cfs_rq
sched/fair: Do not special case tasks in throttled hierarchy
sched/fair: update_cfs_group() for throttled cfs_rqs
sched/fair: Propagate load for throttled cfs_rq
sched/fair: Get rid of throttled_lb_pair()
sched/fair: Task based throttle time accounting
sched/fair: Switch to task based throttle model
sched/fair: Implement throttle task work and related helpers
sched/fair: Add related data structure for task based throttle
sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig
sched: Move STDL_INIT() functions out-of-line
sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
sched/deadline: Fix race in push_dl_task()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Madhavan Srinivasan:
- powerpc support for BPF arena and arena atomics
- Patches to switch to msi parent domain (per-device MSI domains)
- Add a lock contention tracepoint in the queued spinlock slowpath
- Fixes for underflow in pseries/powernv msi and pci paths
- Switch from legacy-of-mm-gpiochip dependency to platform driver
- Fixes for handling TLB misses
- Introduce support for powerpc papr-hvpipe
- Add vpa-dtl PMU driver for pseries platform
- Misc fixes and cleanups
Thanks to Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira
Rajeev, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Gautam
Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence,
Kajol Jain, Kienan Stewart, Linus Walleij, Mahesh Salgaonkar, Nam Cao,
Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters,
Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas
Gleixner, Thomas Huth, Thorsten Blum, Tyrel Datwyler, and Venkat Rao
Bagalkote.
* tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (49 commits)
powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr struct
genirq/msi: Remove msi_post_free()
powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed
powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data
docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu
powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
powerpc/time: Expose boot_tb via accessor
powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
powerpc/fprobe: fix updated fprobe for function-graph tracer
powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL
powerpc64/modules: replace stub allocation sentinel with an explicit counter
powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs
powerpc/ftrace: ensure ftrace record ops are always set for NOPs
powerpc/603: Really copy kernel PGD entries into all PGDIRs
powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler
powerpc/pseries: HVPIPE changes to support migration
powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS
powerpc/pseries: Enable HVPIPE event message interrupt
...
|
|
Fix the the following build errors with CONFIG_UAPI_HEADER_TEST:
./usr/include/asm/papr-hvpipe.h:16:9: error: unknown type name 'u8'
16 | u8 version;
./usr/include/asm/papr-hvpipe.h:17:9: error: unknown type name 'u8'
17 | u8 reserved[3];
./usr/include/asm/papr-hvpipe.h:18:9: error: unknown type name 'u32'
18 | u32 flags;
./usr/include/asm/papr-hvpipe.h:19:9: error: unknown type name 'u8'
19 | u8 reserved2[40];
Fixes: 043439ad1a23c ("powerpc/pseries: Define papr-hvpipe ioctl")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250922091108.1483970-1-haren@linux.ibm.com
|
|
- Define accessor function get_boot_tb() to safely return boot_tb value,
this is only needed when running in SPLPAR environments, so the
accessor is built conditionally under CONFIG_PPC_SPLPAR.
- Tag boot_tb as __ro_after_init since it is written once at initialized
and never updated afterwards.
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Tested-by: Tejas Manhas <tejas05@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250915102947.26681-2-atrajeev@linux.ibm.com
|
|
Patch series "kasan: unify kasan_enabled() and remove arch-specific
implementations", v6.
This patch series addresses the fragmentation in KASAN initialization
across architectures by introducing a unified approach that eliminates
duplicate static keys and arch-specific kasan_arch_is_ready()
implementations.
The core issue is that different architectures have inconsistent approaches
to KASAN readiness tracking:
- PowerPC, LoongArch, and UML arch, each implement own kasan_arch_is_ready()
- Only HW_TAGS mode had a unified static key (kasan_flag_enabled)
- Generic and SW_TAGS modes relied on arch-specific solutions
or always-on behavior
This patch (of 2):
Introduce CONFIG_ARCH_DEFER_KASAN to identify architectures [1] that need
to defer KASAN initialization until shadow memory is properly set up, and
unify the static key infrastructure across all KASAN modes.
[1] PowerPC, UML, LoongArch selects ARCH_DEFER_KASAN.
The core issue is that different architectures haveinconsistent approaches
to KASAN readiness tracking:
- PowerPC, LoongArch, and UML arch, each implement own
kasan_arch_is_ready()
- Only HW_TAGS mode had a unified static key (kasan_flag_enabled)
- Generic and SW_TAGS modes relied on arch-specific solutions or always-on
behavior
This patch addresses the fragmentation in KASAN initialization across
architectures by introducing a unified approach that eliminates duplicate
static keys and arch-specific kasan_arch_is_ready() implementations.
Let's replace kasan_arch_is_ready() with existing kasan_enabled() check,
which examines the static key being enabled if arch selects
ARCH_DEFER_KASAN or has HW_TAGS mode support. For other arch,
kasan_enabled() checks the enablement during compile time.
Now KASAN users can use a single kasan_enabled() check everywhere.
Link: https://lkml.kernel.org/r/20250810125746.1105476-1-snovitoll@gmail.com
Link: https://lkml.kernel.org/r/20250810125746.1105476-2-snovitoll@gmail.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217049
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> #powerpc
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: David Gow <davidgow@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Marco Elver <elver@google.com>
Cc: Qing Zhang <zhangqing@loongson.cn>
Cc: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
PAGE_KERNEL_TEXT is an old macro that is used to tell kernel whether
kernel text has to be mapped read-only or read-write based on build
time options.
But nowadays, with functionnalities like jump_labels, static links,
etc ... more only less all kernels need to be read-write at some
point, and some combinations of configs failed to work due to
innacurate setting of PAGE_KERNEL_TEXT. On the other hand, today
we have CONFIG_STRICT_KERNEL_RWX which implements a more controlled
access to kernel modifications.
Instead of trying to keep PAGE_KERNEL_TEXT accurate with all
possible options that may imply kernel text modification, always
set kernel text read-write at startup and rely on
CONFIG_STRICT_KERNEL_RWX to provide accurate protection.
Do this by passing PAGE_KERNEL_X to map_kernel_page() in
__maping_ram_chunk() instead of passing PAGE_KERNEL_TEXT. Once
this is done, the only remaining user of PAGE_KERNEL_TEXT is
mmu_mark_initmem_nx() which uses it in a call to setibat().
As setibat() ignores the RW/RO, we can seamlessly replace
PAGE_KERNEL_TEXT by PAGE_KERNEL_X here as well and get rid of
PAGE_KERNEL_TEXT completely.
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://lore.kernel.org/all/342b4120-911c-4723-82ec-d8c9b03a8aef@mailbox.org/
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/8e2d793abf87ae3efb8f6dce10f974ac0eda61b8.1757412205.git.christophe.leroy@csgroup.eu
|
|
Since commit 4346ba160409 ("fprobe: Rewrite fprobe on function-graph
tracer"), FPROBE depends on HAVE_FUNCTION_GRAPH_FREGS. With previous
patch adding HAVE_FUNCTION_GRAPH_FREGS for powerpc, FPROBE can be
enabled on powerpc. But with the commit b5fa903b7f7c ("fprobe: Add
fprobe_header encoding feature"), asm/fprobe.h header is needed to
define arch dependent encode/decode macros. The fprobe header MSB
pattern on powerpc is not 0xf. So, define FPROBE_HEADER_MSB_PATTERN
expected on powerpc.
Also, commit 762abbc0d09f ("fprobe: Use ftrace_regs in fprobe exit
handler") introduced HAVE_FTRACE_REGS_HAVING_PT_REGS for archs that
have pt_regs in ftrace_regs. Advertise that on powerpc to reuse
common definitions like ftrace_partial_regs().
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Bodkhe <aditya.b1@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250916044035.29033-2-adityab1@linux.ibm.com
|
|
commit a1be9ccc57f0 ("function_graph: Support recording and printing the
return value of function") introduced support for function graph return
value tracing.
Additionally, commit a3ed4157b7d8 ("fgraph: Replace fgraph_ret_regs with
ftrace_regs") further refactored and optimized the implementation,
making `struct fgraph_ret_regs` unnecessary.
This patch enables the above modifications for powerpc all, ensuring that
function graph return value tracing is available on this architecture.
In this patch we have redefined two functions:
- 'ftrace_regs_get_return_value()' - the existing implementation on
ppc returns -ve of return value based on some conditions not
relevant to our patch.
- 'ftrace_regs_get_frame_pointer()' - always returns 0 in current code .
We also allocate stack space to equivalent of 'SWITCH_FRAME_SIZE',
allowing us to directly use predefined offsets like 'GPR3' and 'GPR4'
this keeps code clean and consistent with already defined offsets .
After this patch, v6.14+ kernel can also be built with FPROBE on powerpc
but there are a few other build and runtime dependencies for FPROBE to
work properly. The next patch addresses them.
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Aditya Bodkhe <adityab1@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250916044035.29033-1-adityab1@linux.ibm.com
|
|
The logic for allocating ppc64_stub_entry trampolines in the .stubs
section relies on an inline sentinel, where a NULL .funcdata member
indicates an available slot.
While preceding commits fixed the initialization bugs that led to ftrace
stub corruption, the sentinel-based approach remains fragile: it depends
on an implicit convention between subsystems modifying different
struct types in the same memory area.
Replace the sentinel with an explicit counter, module->arch.num_stubs.
Instead of iterating through memory to find a NULL marker, the module
loader uses this counter as the boundary for the next free slot.
This simplifies the allocation code, hardens it against future changes
to stub structures, and removes the need for an extra relocation slot
previously reserved to terminate the sentinel search.
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250912142740.3581368-4-joe.lawrence@redhat.com
|
|
Commit 82ef440f9a38 ("powerpc/603: Copy kernel PGD entries into all
PGDIRs and preallocate execmem page tables") was supposed to extend
to powerpc 603 the copy of kernel PGD entries into all PGDIRs
implemented in a previous patch on the 8xx. But 603 is book3s/32 and
uses a duplicate of pgd_alloc() defined in another header.
So really do the copy at the correct place for the 603.
Fixes: 82ef440f9a38 ("powerpc/603: Copy kernel PGD entries into all PGDIRs and preallocate execmem page tables")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/752ab7514cae089a2dd7cc0f3d5e35849f76adb9.1755757797.git.christophe.leroy@csgroup.eu
|
|
The partition uses “Hypervisor Pipe OS Enablement Notification”
system parameter token (value = 64) to enable / disable hvpipe in
the hypervisor. Once hvpipe is enabled, the hypervisor notifies
OS if the payload is pending for that partition from any source.
This system parameter token takes 1 byte length of data with
1 = Enable and 0 = Disable.
Enable hvpipe in the hypervisor with ibm,set-system-parameter
RTAS after registering hvpipe event source interrupt.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-9-haren@linux.ibm.com
|
|
Define HVPIPE specific macros which are needed to support
ibm,send-hvpipe-msg and ibm,receive-hvpipe-msg RTAS calls
and used to handle HVPIPE message events.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-3-haren@linux.ibm.com
|