| Age | Commit message (Collapse) | Author |
|
Add TDP data for laptop model GA503QM.
Signed-off-by: Denis Benato <denis.benato@linux.dev>
Link: https://patch.msgid.link/20260309183559.433555-2-denis.benato@linux.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
I have been using a linux.dev email since that is hugely better than gmail.
Signed-off-by: Denis Benato <denis.benato@linux.dev>
Signed-off-by: Denis Benato <benato.denis96@gmail.com>
Link: https://patch.msgid.link/20260304141102.63732-1-denis.benato@linux.dev
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Commit 005e8dddd497 ("PM: hibernate: don't store zero pages in the
image file") added an optimization to skip zero-filled pages in the
hibernation image. On restore, zero pages are handled internally by
snapshot_write_next() in a loop that processes them without returning
to the caller.
With the userspace restore interface, writing the last non-zero page
to /dev/snapshot is followed by the SNAPSHOT_ATOMIC_RESTORE ioctl. At
this point there are no more calls to snapshot_write_next() so any
trailing zero pages are not processed, snapshot_image_loaded() fails
because handle->cur is smaller than expected, the ioctl returns -EPERM
and the image is not restored.
The in-kernel restore path is not affected by this because the loop in
load_image() in swap.c calls snapshot_write_next() until it returns 0.
It is this final call that drains any trailing zero pages.
Fixed by calling snapshot_write_next() in snapshot_write_finalize(),
giving the kernel the chance to drain any trailing zero pages.
Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file")
Signed-off-by: Alberto Garcia <berto@igalia.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Link: https://patch.msgid.link/ef5a7c5e3e3dbd17dcb20efaa0c53a47a23498bb.1773075892.git.berto@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
A recently reported issue highlighted that the cached requested_freq
is not guaranteed to stay in sync with policy->cur. If the platform
changes the actual CPU frequency after the governor sets one (e.g.
due to platform-specific frequency scaling) and a re-sync occurs
later, policy->cur may diverge from requested_freq.
This can lead to incorrect behavior in the conservative governor.
For example, the governor may assume the CPU is already running at
the maximum frequency and skip further increases even though there
is still headroom.
Avoid this by resetting the cached requested_freq to policy->cur on
detecting a change in policy limits.
Reported-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Tested-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Link: https://lore.kernel.org/all/20260210115458.3493646-1-zhenglifeng1@huawei.com/
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/d846a141a98ac0482f20560fcd7525c0f0ec2f30.1773999467.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The commit 6db0f533d320 ("cpufreq: preserve freq_table_sorted
across suspend/hibernate") unintentionally made a change where
cpufreq_frequency_table_cpuinfo() isn't getting called anymore
for old policies getting re-initialized.
This leads to potentially invalid values of policy->max and
policy->cpuinfo_max_freq.
Fix the issue by reverting the original commit and adding the condition
for just the sorting function.
Fixes: 6db0f533d320 ("cpufreq: preserve freq_table_sorted across suspend/hibernate")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 6.19+ <stable@vger.kernel.org> # 6.19+
Link: https://patch.msgid.link/65ba5c45749267c82e8a87af3dc788b37a0b3f48.1773998611.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Move FSGSBASE enablement from identify_cpu() to cpu_init_exception_handling()
to ensure it is enabled before any exceptions can occur on both boot and
secondary CPUs.
== Background ==
Exception entry code (paranoid_entry()) uses ALTERNATIVE patching based on
X86_FEATURE_FSGSBASE to decide whether to use RDGSBASE/WRGSBASE instructions
or the slower RDMSR/SWAPGS sequence for saving/restoring GSBASE.
On boot CPU, ALTERNATIVE patching happens after enabling FSGSBASE in CR4.
When the feature is available, the code is permanently patched to use
RDGSBASE/WRGSBASE, which require CR4.FSGSBASE=1 to execute without triggering
== Boot Sequence ==
Boot CPU (with CR pinning enabled):
trap_init()
cpu_init() <- Uses unpatched code (RDMSR/SWAPGS)
x2apic_setup()
...
arch_cpu_finalize_init()
identify_boot_cpu()
identify_cpu()
cr4_set_bits(X86_CR4_FSGSBASE) # Enables the feature
# This becomes part of cr4_pinned_bits
...
alternative_instructions() <- Patches code to use RDGSBASE/WRGSBASE
Secondary CPUs (with CR pinning enabled):
start_secondary()
cr4_init() <- Code already patched, CR4.FSGSBASE=1
set implicitly via cr4_pinned_bits
cpu_init() <- exceptions work because FSGSBASE is
already enabled
Secondary CPU (with CR pinning disabled):
start_secondary()
cr4_init() <- Code already patched, CR4.FSGSBASE=0
cpu_init()
x2apic_setup()
rdmsrq(MSR_IA32_APICBASE) <- Triggers #VC in SNP guests
exc_vmm_communication()
paranoid_entry() <- Uses RDGSBASE with CR4.FSGSBASE=0
(patched code)
...
ap_starting()
identify_secondary_cpu()
identify_cpu()
cr4_set_bits(X86_CR4_FSGSBASE) <- Enables the feature, which is
too late
== CR Pinning ==
Currently, for secondary CPUs, CR4.FSGSBASE is set implicitly through
CR-pinning: the boot CPU sets it during identify_cpu(), it becomes part of
cr4_pinned_bits, and cr4_init() applies those pinned bits to secondary CPUs.
This works but creates an undocumented dependency between cr4_init() and the
pinning mechanism.
== Problem ==
Secondary CPUs boot after alternatives have been applied globally. They
execute already-patched paranoid_entry() code that uses RDGSBASE/WRGSBASE
instructions, which require CR4.FSGSBASE=1. Upcoming changes to CR pinning
behavior will break the implicit dependency, causing secondary CPUs to
generate #UD.
This issue manifests itself on AMD SEV-SNP guests, where the rdmsrq() in
x2apic_setup() triggers a #VC exception early during cpu_init(). The #VC
handler (exc_vmm_communication()) executes the patched paranoid_entry() path.
Without CR4.FSGSBASE enabled, RDGSBASE instructions trigger #UD.
== Fix ==
Enable FSGSBASE explicitly in cpu_init_exception_handling() before loading
exception handlers. This makes the dependency explicit and ensures both
boot and secondary CPUs have FSGSBASE enabled before paranoid_entry()
executes.
Fixes: c82965f9e530 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
Reported-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/20260318075654.1792916-2-nikunj@amd.com
|
|
devm_regmap_init_mmio() returns an ERR_PTR() on failure, not NULL.
The original code checked for NULL which would never trigger on error,
potentially leading to an invalid pointer dereference.
Use IS_ERR() and PTR_ERR() to properly handle the error case.
Fixes: e88500247dc3 ("gpio: add QIXIS FPGA GPIO controller")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Link: https://patch.msgid.link/20260320-qixis-v1-1-a8efc22e8945@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
|
|
Remove the redundant post-parse validation switch. By the time that
block is reached, xfs_attri_validate() has already guaranteed all name
lengths are non-zero via xfs_attri_validate_namelen(), and
xfs_attri_validate_name_iovec() has already returned -EFSCORRUPTED for
NULL names. For the REMOVE case, attr_value and value_len are
structurally guaranteed to be NULL/zero because the parsing loop only
populates them when value_len != 0. All checks in that switch are
therefore dead code.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The ri_total checks for SET/REPLACE operations are hardcoded to 3,
but xfs_attri_item_size() only emits a value iovec when value_len > 0,
so ri_total is 2 when value_len == 0.
For PPTR_SET/PPTR_REMOVE/PPTR_REPLACE, value_len is validated by
xfs_attri_validate() to be exactly sizeof(struct xfs_parent_rec) and
is never zero, so their hardcoded checks remain correct.
This problem may cause log recovery failures. The following script can be
used to reproduce the problem:
#!/bin/bash
mkfs.xfs -f /dev/sda
mount /dev/sda /mnt/test/
touch /mnt/test/file
for i in {1..200}; do
attr -s "user.attr_$i" -V "value_$i" /mnt/test/file > /dev/null
done
echo 1 > /sys/fs/xfs/debug/larp
echo 1 > /sys/fs/xfs/sda/errortag/larp
attr -s "user.zero" -V "" /mnt/test/file
echo 0 > /sys/fs/xfs/sda/errortag/larp
umount /mnt/test
mount /dev/sda /mnt/test/ # mount failed
Fix this by deriving the expected count dynamically as "2 + !!value_len"
for SET/REPLACE operations.
Cc: stable@vger.kernel.org # v6.9
Fixes: ad206ae50eca ("xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2")
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When inactivating an inode with node-format extended attributes,
xfs_attr3_node_inactive() invalidates all child leaf/node blocks via
xfs_trans_binval(), but intentionally does not remove the corresponding
entries from their parent node blocks. The implicit assumption is that
xfs_attr_inactive() will truncate the entire attr fork to zero extents
afterwards, so log recovery will never reach the root node and follow
those stale pointers.
However, if a log shutdown occurs after the leaf/node block cancellations
commit but before the attr bmap truncation commits, this assumption
breaks. Recovery replays the attr bmap intact (the inode still has
attr fork extents), but suppresses replay of all cancelled leaf/node
blocks, maybe leaving them as stale data on disk. On the next mount,
xlog_recover_process_iunlinks() retries inactivation and attempts to
read the root node via the attr bmap. If the root node was not replayed,
reading the unreplayed root block triggers a metadata verification
failure immediately; if it was replayed, following its child pointers
to unreplayed child blocks triggers the same failure:
XFS (pmem0): Metadata corruption detected at
xfs_da3_node_read_verify+0x53/0x220, xfs_da3_node block 0x78
XFS (pmem0): Unmount and run xfs_repair
XFS (pmem0): First 128 bytes of corrupted metadata buffer:
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
XFS (pmem0): metadata I/O error in "xfs_da_read_buf+0x104/0x190" at daddr 0x78 len 8 error 117
Fix this in two places:
In xfs_attr3_node_inactive(), after calling xfs_trans_binval() on a
child block, immediately remove the entry that references it from the
parent node in the same transaction. This eliminates the window where
the parent holds a pointer to a cancelled block. Once all children are
removed, the now-empty root node is converted to a leaf block within the
same transaction. This node-to-leaf conversion is necessary for crash
safety. If the system shutdown after the empty node is written to the
log but before the second-phase bmap truncation commits, log recovery
will attempt to verify the root block on disk. xfs_da3_node_verify()
does not permit a node block with count == 0; such a block will fail
verification and trigger a metadata corruption shutdown. on the other
hand, leaf blocks are allowed to have this transient state.
In xfs_attr_inactive(), split the attr fork truncation into two explicit
phases. First, truncate all extents beyond the root block (the child
extents whose parent references have already been removed above).
Second, invalidate the root block and truncate the attr bmap to zero in
a single transaction. The two operations in the second phase must be
atomic: as long as the attr bmap has any non-zero length, recovery can
follow it to the root block, so the root block invalidation must commit
together with the bmap-to-zero truncation.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Factor out wrapper xfs_attr3_leaf_init function, which exported for
external use.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Long Li <leo.lilong@huawei.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Factor out wrapper xfs_attr3_node_entry_remove function, which
exported for external use.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Long Li <leo.lilong@huawei.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The assertion functions properly because we currently only truncate the
attr to a zero size. Any other new size of the attr is not preempted.
Make this assertion is specific to the datafork, preparing for
subsequent patches to truncate the attribute to a non-zero size.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Long Li <leo.lilong@huawei.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Shared GPIOs may be assigned to child nodes of device nodes which don't
themselves bind to any struct device. We need to pass the firmware node
that is the actual consumer to gpiolib-shared and compare against it
instead of unconditionally using the fwnode of the consumer device.
Fixes: a060b8c511ab ("gpiolib: implement low-level, shared GPIO support")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Closes: https://lore.kernel.org/all/921ba8ce-b18e-4a99-966d-c763d22081e2@nvidia.com/
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260318-gpio-shared-xlate-v2-2-0ce34c707e81@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
|
|
OF-based GPIO controller drivers may provide a translation function that
calculates the real chip offset from whatever devicetree sources
provide. We need to take this into account in the shared GPIO management
and call of_xlate() if it's provided and adjust the entry->offset we
initially set when scanning the tree.
To that end: modify the shared GPIO API to take the GPIO chip as
argument on setup (to avoid having to rcu_dereference() it from the GPIO
device) and protect the access to entry->offset with the existing lock.
Fixes: a060b8c511ab ("gpiolib: implement low-level, shared GPIO support")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Closes: https://lore.kernel.org/all/921ba8ce-b18e-4a99-966d-c763d22081e2@nvidia.com/
Reviewed-by: Linus Walleij <linusw@kernel.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260318-gpio-shared-xlate-v2-1-0ce34c707e81@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
|
|
unlink_nv12_plane() will clobber parts of the plane state
potentially already set up by plane_atomic_check(), so we
must make sure not to call the two in the wrong order.
The problem happens when a plane previously selected as
a Y plane is now configured as a normal plane by user space.
plane_atomic_check() will first compute the proper plane
state based on the userspace request, and unlink_nv12_plane()
later clears some of the state.
This used to work on account of unlink_nv12_plane() skipping
the state clearing based on the plane visibility. But I removed
that check, thinking it was an impossible situation. Now when
that situation happens unlink_nv12_plane() will just WARN
and proceed to clobber the state.
Rather than reverting to the old way of doing things, I think
it's more clear if we unlink the NV12 planes before we even
compute the new plane state.
Cc: stable@vger.kernel.org
Reported-by: Khaled Almahallawy <khaled.almahallawy@intel.com>
Closes: https://lore.kernel.org/intel-gfx/20260212004852.1920270-1-khaled.almahallawy@intel.com/
Tested-by: Khaled Almahallawy <khaled.almahallawy@intel.com>
Fixes: 6a01df2f1b2a ("drm/i915: Remove pointless visible check in unlink_nv12_plane()")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20260316163953.12905-2-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
(cherry picked from commit 017ecd04985573eeeb0745fa2c23896fb22ee0cc)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
Put the barrier() before the OP so that anything we read out in
OP and check in COND will actually be read out after the timeout
has been evaluated.
Currently the only place where we use OP is __intel_wait_for_register(),
but the use there is precisely susceptible to this reordering, assuming
the ktime_*() stuff itself doesn't act as a sufficient barrier:
__intel_wait_for_register(...)
{
...
ret = __wait_for(reg_value = intel_uncore_read_notrace(...),
(reg_value & mask) == value, ...);
...
}
Cc: stable@vger.kernel.org
Fixes: 1c3c1dc66a96 ("drm/i915: Add compiler barrier to wait_for")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20260313110740.24620-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit a464bace0482aa9a83e9aa7beefbaf44cd58e6cf)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
When reading exactly 512 bytes with burst read enabled, the
extra_byte_added path breaks out of the inner do-while without
decrementing len. The outer while(len) then re-enters and gmbus_wait()
times out since all data has been delivered. Decrement len before the
break so the outer loop terminates correctly.
Fixes: d5dc0f43f268 ("drm/i915/gmbus: Enable burst read")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/20260316231920.135438-2-samasth.norway.ananda@oracle.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 4ab0f09ee73fc853d00466682635f67c531f909c)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
After this commit (e2b76ab8b5c9 "ksmbd: add support for read compound"),
response buffer management was changed to use dynamic iov array.
In the new design, smb2_calc_max_out_buf_len() expects the second
argument (hdr2_len) to be the offset of ->Buffer field in the
response structure, not a hardcoded magic number.
Fix the remaining call sites to use the correct offsetof() value.
Cc: stable@vger.kernel.org
Fixes: e2b76ab8b5c9 ("ksmbd: add support for read compound")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2_lock() has three error handling issues after list_del() detaches
smb_lock from lock_list at no_check_cl:
1) If vfs_lock_file() returns an unexpected error in the non-UNLOCK
path, goto out leaks smb_lock and its flock because the out:
handler only iterates lock_list and rollback_list, neither of
which contains the detached smb_lock.
2) If vfs_lock_file() returns -ENOENT in the UNLOCK path, goto out
leaks smb_lock and flock for the same reason. The error code
returned to the dispatcher is also stale.
3) In the rollback path, smb_flock_init() can return NULL on
allocation failure. The result is dereferenced unconditionally,
causing a kernel NULL pointer dereference. Add a NULL check to
prevent the crash and clean up the bookkeeping; the VFS lock
itself cannot be rolled back without the allocation and will be
released at file or connection teardown.
Fix cases 1 and 2 by hoisting the locks_free_lock()/kfree() to before
the if(!rc) check in the UNLOCK branch so all exit paths share one
free site, and by freeing smb_lock and flock before goto out in the
non-UNLOCK branch. Propagate the correct error code in both cases.
Fix case 3 by wrapping the VFS unlock in an if(rlock) guard and adding
a NULL check for locks_free_lock(rlock) in the shared cleanup.
Found via call-graph analysis using sqry.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Suggested-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Werner Kasselman <werner@verivus.com>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb_grant_oplock() has two issues in the oplock publication sequence:
1) opinfo is linked into ci->m_op_list (via opinfo_add) before
add_lease_global_list() is called. If add_lease_global_list()
fails (kmalloc returns NULL), the error path frees the opinfo
via __free_opinfo() while it is still linked in ci->m_op_list.
Concurrent m_op_list readers (opinfo_get_list, or direct iteration
in smb_break_all_levII_oplock) dereference the freed node.
2) opinfo->o_fp is assigned after add_lease_global_list() publishes
the opinfo on the global lease list. A concurrent
find_same_lease_key() can walk the lease list and dereference
opinfo->o_fp->f_ci while o_fp is still NULL.
Fix by restructuring the publication sequence to eliminate post-publish
failure:
- Set opinfo->o_fp before any list publication (fixes NULL deref).
- Preallocate lease_table via alloc_lease_table() before opinfo_add()
so add_lease_global_list() becomes infallible after publication.
- Keep the original m_op_list publication order (opinfo_add before
lease list) so concurrent opens via same_client_has_lease() and
opinfo_get_list() still see the in-flight grant.
- Use opinfo_put() instead of __free_opinfo() on err_out so that
the RCU-deferred free path is used.
This also requires splitting add_lease_global_list() to take a
preallocated lease_table and changing its return type from int to void,
since it can no longer fail.
Fixes: 1dfd062caa16 ("ksmbd: fix use-after-free by using call_rcu() for oplock_info")
Cc: stable@vger.kernel.org
Signed-off-by: Werner Kasselman <werner@verivus.com>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When a multichannel session binding request fails (e.g. wrong password),
the error path unconditionally sets sess->state = SMB2_SESSION_EXPIRED.
However, during binding, sess points to the target session looked up via
ksmbd_session_lookup_slowpath() -- which belongs to another connection's
user. This allows a remote attacker to invalidate any active session by
simply sending a binding request with a wrong password (DoS).
Fix this by skipping session expiration when the failed request was
a binding attempt, since the session does not belong to the current
connection. The reference taken by ksmbd_session_lookup_slowpath() is
still correctly released via ksmbd_user_session_put().
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
|
|
To pick up the changes in:
6ffd853b0b10e1e2 ("build_bug.h: correct function parameters names in kernel-doc")
That just add some comments, addressing this perf tools build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/linux/build_bug.h include/linux/build_bug.h
Please take a look at tools/include/uapi/README for further info on this
synchronization process.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes in:
e2ffe85b6d2bb778 ("KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM")
That just rebuilds kvm-stat.c on x86, no change in functionality.
This silences these perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Please see tools/include/uapi/README for further details.
Cc: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes in:
da142f3d373a6dda ("KVM: Remove subtle "struct kvm_stats_desc" pseudo-overlay")
That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the 'perf trace' ioctl syscall argument beautifiers.
This addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Please see tools/include/uapi/README for further details.
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up the changes from these csets:
9073428bb204d921 ("x86/sev: Allow IBPB-on-Entry feature for SNP guests")
That cause no changes to tooling as it doesn't include a new MSR to be
captured by the tools/perf/trace/beauty/tracepoints/x86_msr.sh script.
Just silences this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
I2C devices with associated pinctrl states (DPAUX I2C controllers)
will change pinctrl state during runtime PM. This requires taking
a mutex, so these devices cannot be marked as IRQ safe.
Add PINCTRL as dependency to avoid build errors.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/all/E1vsNBv-00000009nfA-27ZK@rmk-PC.armlinux.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull bpf fixes from Alexei Starovoitov:
- Fix how linked registers track zero extension of subregisters (Daniel
Borkmann)
- Fix unsound scalar fork for OR instructions (Daniel Wade)
- Fix exception exit lock check for subprogs (Ihor Solodrai)
- Fix undefined behavior in interpreter for SDIV/SMOD instructions
(Jenny Guanni Qu)
- Release module's BTF when module is unloaded (Kumar Kartikeya
Dwivedi)
- Fix constant blinding for PROBE_MEM32 instructions (Sachin Kumar)
- Reset register ID for END instructions to prevent incorrect value
tracking (Yazhou Tang)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add a test cases for sync_linked_regs regarding zext propagation
bpf: Fix sync_linked_regs regarding BPF_ADD_CONST32 zext propagation
selftests/bpf: Add tests for maybe_fork_scalars() OR vs AND handling
bpf: Fix unsound scalar forking in maybe_fork_scalars() for BPF_OR
selftests/bpf: Add tests for sdiv32/smod32 with INT_MIN dividend
bpf: Fix undefined behavior in interpreter sdiv/smod for INT_MIN
selftests/bpf: Add tests for bpf_throw lock leak from subprogs
bpf: Fix exception exit lock checking for subprogs
bpf: Release module BTF IDR before module unload
selftests/bpf: Fix pkg-config call on static builds
bpf: Fix constant blinding for PROBE_MEM32 stores
selftests/bpf: Add test for BPF_END register ID reset
bpf: Reset register ID for BPF_END value tracking
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Revert "tracing: Remove pid in task_rename tracing output"
A change was made to remove the pid field from the task_rename event
because it was thought that it was always done for the current task
and recording the pid would be redundant. This turned out to be
incorrect and there are a few corner case where this is not true and
caused some regressions in tooling.
- Fix the reading from user space for migration
The reading of user space uses a seq lock type of logic where it uses
a per-cpu temporary buffer and disables migration, then enables
preemption, does the copy from user space, disables preemption,
enables migration and checks if there was any schedule switches while
preemption was enabled. If there was a context switch, then it is
considered that the per-cpu buffer could be corrupted and it tries
again. There's a protection check that tests if it takes a hundred
tries, it issues a warning and exits out to prevent a live lock.
This was triggered because the task was selected by the load balancer
to be migrated to another CPU, every time preemption is enabled the
migration task would schedule in try to migrate the task but can't
because migration is disabled and let it run again. This caused the
scheduler to schedule out the task every time it enabled preemption
and made the loop never exit (until the 100 iteration test
triggered).
Fix this by enabling and disabling preemption and keeping migration
enabled if the reading from user space needs to be done again. This
will let the migration thread migrate the task and the copy from user
space will likely pass on the next iteration.
- Fix trace_marker copy option freeing
The "copy_trace_marker" option allows a tracing instance to get a
copy of a write to the trace_marker file of the top level instance.
This is managed by a link list protected by RCU. When an instance is
removed, a check is made if the option is set, and if so
synchronized_rcu() is called.
The problem is that an iteration is made to reset all the flags to
what they were when the instance was created (to perform clean ups)
was done before the check of the copy_trace_marker option and that
option was cleared, so the synchronize_rcu() was never called.
Move the clearing of all the flags after the check of
copy_trace_marker to do synchronize_rcu() so that the option is still
set if it was before and the synchronization is performed.
- Fix entries setting when validating the persistent ring buffer
When validating the persistent ring buffer on boot up, the number of
events per sub-buffer is added to the sub-buffer meta page. The
validator was updating cpu_buffer->head_page (the first sub-buffer of
the per-cpu buffer) and not the "head_page" variable that was
iterating the sub-buffers. This was causing the first sub-buffer to
be assigned the entries for each sub-buffer and not the sub-buffer
that was supposed to be updated.
- Use "hash" value to update the direct callers
When updating the ftrace direct callers, it assigned a temporary
callback to all the callback functions of the ftrace ops and not just
the functions represented by the passed in hash. This causes an
unnecessary slow down of the functions of the ftrace_ops that is not
being modified. Only update the functions that are going to be
modified to call the ftrace loop function so that the update can be
made on those functions.
* tag 'trace-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod
ring-buffer: Fix to update per-subbuf entries of persistent ring buffer
tracing: Fix trace_marker copy link list updates
tracing: Fix failure to read user space from system call trace events
tracing: Revert "tracing: Remove pid in task_rename tracing output"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- fix broken I2C communication on Armada 3700 with recovery
- fix device_node reference leak in probe (fsi)
- fix NULL-deref when serial string is missing (cp2615)
* tag 'i2c-for-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: pxa: defer reset on Armada 3700 when recovery is used
i2c: fsi: Fix a potential leak in fsi_i2c_probe()
i2c: cp2615: fix serial string NULL-deref at probe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Improve Qemu MCE-injection behavior by only using AMD SMCA MSRs if
the feature bit is set
- Fix the relative path of gettimeofday.c inclusion in vclock_gettime.c
- Fix a boot crash on UV clusters when a socket is marked as
'deconfigured' which are mapped to the SOCK_EMPTY node ID by
the UV firmware, while Linux APIs expect NUMA_NO_NODE.
The difference being (0xffff [unsigned short ~0]) vs [int -1]
* tag 'x86-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Handle deconfigured sockets
x86/entry/vdso: Fix path of included gettimeofday.c
x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
- Fix a PMU driver crash on AMD EPYC systems, caused by
a race condition in x86_pmu_enable()
- Fix a possible counter-initialization bug in x86_pmu_enable()
- Fix a counter inheritance bug in inherit_event() and
__perf_event_read()
- Fix an Intel PMU driver branch constraints handling bug
found by UBSAN
- Fix the Intel PMU driver's new Off-Module Response (OMR)
support code for Diamond Rapids / Nova lake, to fix a snoop
information parsing bug
* tag 'perf-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix OMR snoop information parsing issues
perf/x86/intel: Add missing branch counters constraint apply
perf: Make sure to use pmu_ctx->pmu for groups
x86/perf: Make sure to program the counter value for stopped events on migration
perf/x86: Move event pointer setup earlier in x86_pmu_enable()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"Fix three more livepatching related build environment bugs, and a
false positive warning with Clang jump tables"
* tag 'objtool-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix Clang jump table detection
livepatch/klp-build: Fix inconsistent kernel version
objtool/klp: fix mkstemp() failure with long paths
objtool/klp: fix data alignment in __clone_symbol()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"Fix a sparse build error regression in <linux/local_lock_internal.h>
caused by the locking context-analysis changes"
* tag 'locking-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
include/linux/local_lock_internal.h: Make this header file again compatible with sparse
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"Fix a mailbox channel leak in the riscv-rpmi-sysmsi irqchip driver"
* tag 'irq-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/riscv-rpmi-sysmsi: Fix mailbox channel leak in rpmi_sysmsi_probe()
|
|
The call to mipi_dsi_host_register triggers a callback to mtk_dsi_bind,
which uses dev_get_drvdata to retrieve the mtk_dsi struct, so this
structure needs to be stored inside the driver data before invoking it.
As drvdata is currently uninitialized it leads to a crash when
registering the DSI DRM encoder right after acquiring
the mode_config.idr_mutex, blocking all subsequent DRM operations.
Fixes the following crash during mediatek-drm probe (tested on Xiaomi
Smart Clock x04g):
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000040
[...]
Modules linked in: mediatek_drm(+) drm_display_helper cec drm_client_lib
drm_dma_helper drm_kms_helper panel_simple
[...]
Call trace:
drm_mode_object_add+0x58/0x98 (P)
__drm_encoder_init+0x48/0x140
drm_encoder_init+0x6c/0xa0
drm_simple_encoder_init+0x20/0x34 [drm_kms_helper]
mtk_dsi_bind+0x34/0x13c [mediatek_drm]
component_bind_all+0x120/0x280
mtk_drm_bind+0x284/0x67c [mediatek_drm]
try_to_bring_up_aggregate_device+0x23c/0x320
__component_add+0xa4/0x198
component_add+0x14/0x20
mtk_dsi_host_attach+0x78/0x100 [mediatek_drm]
mipi_dsi_attach+0x2c/0x50
panel_simple_dsi_probe+0x4c/0x9c [panel_simple]
mipi_dsi_drv_probe+0x1c/0x28
really_probe+0xc0/0x3dc
__driver_probe_device+0x80/0x160
driver_probe_device+0x40/0x120
__device_attach_driver+0xbc/0x17c
bus_for_each_drv+0x88/0xf0
__device_attach+0x9c/0x1cc
device_initial_probe+0x54/0x60
bus_probe_device+0x34/0xa0
device_add+0x5b0/0x800
mipi_dsi_device_register_full+0xdc/0x16c
mipi_dsi_host_register+0xc4/0x17c
mtk_dsi_probe+0x10c/0x260 [mediatek_drm]
platform_probe+0x5c/0xa4
really_probe+0xc0/0x3dc
__driver_probe_device+0x80/0x160
driver_probe_device+0x40/0x120
__driver_attach+0xc8/0x1f8
bus_for_each_dev+0x7c/0xe0
driver_attach+0x24/0x30
bus_add_driver+0x11c/0x240
driver_register+0x68/0x130
__platform_register_drivers+0x64/0x160
mtk_drm_init+0x24/0x1000 [mediatek_drm]
do_one_initcall+0x60/0x1d0
do_init_module+0x54/0x240
load_module+0x1838/0x1dc0
init_module_from_file+0xd8/0xf0
__arm64_sys_finit_module+0x1b4/0x428
invoke_syscall.constprop.0+0x48/0xc8
do_el0_svc+0x3c/0xb8
el0_svc+0x34/0xe8
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
Code: 52800022 941004ab 2a0003f3 37f80040 (29005a80)
Fixes: e4732b590a77 ("drm/mediatek: dsi: Register DSI host after acquiring clocks and PHY")
Signed-off-by: Luca Leonardo Scorcia <l.scorcia@gmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20260225094047.76780-1-l.scorcia@gmail.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
|
|
On weakly ordered architectures (e.g., arm64), the lockless check in
wq_watchdog_timer_fn() can observe a reordering between the worklist
insertion and the last_progress_ts update. Specifically, the watchdog
can see a non-empty worklist (from a list_add) while reading a stale
last_progress_ts value, causing a false positive stall report.
This was confirmed by reading pool->last_progress_ts again after holding
pool->lock in wq_watchdog_timer_fn():
workqueue watchdog: pool 7 false positive detected!
lockless_ts=4784580465 locked_ts=4785033728
diff=453263ms worklist_empty=0
To avoid slowing down the hot path (queue_work, etc.), recheck
last_progress_ts with pool->lock held. This will eliminate the false
positive with minimal overhead.
Remove two extra empty lines in wq_watchdog_timer_fn() as we are on it.
Fixes: 82607adcf9cd ("workqueue: implement lockup detector")
Cc: stable@vger.kernel.org # v4.5+
Assisted-by: claude-code:claude-opus-4-6
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
slot_free() basically completely resets the slots by clearing all of
its flags and attributes. While zram_writeback_complete() restores
some of flags back (those that are necessary for async read
decompression) we still lose a lot of slot's metadata. For example,
slot's ac-time, or ZRAM_INCOMPRESSIBLE.
More importantly, restoring flags/attrs requires extra attention as
some of the flags are directly affecting zram device stats. And the
original code did not pay that attention. Namely ZRAM_HUGE slots
handling in zram_writeback_complete(). The call to slot_free() would
decrement ->huge_pages, however when zram_writeback_complete() restored
the slot's ZRAM_HUGE flag, it would not get reflected in an incremented
->huge_pages. So when the slot would finally get freed, slot_free()
would decrement ->huge_pages again, leading to underflow.
Fix this by open-coding the required memory free and stats updates in
zram_writeback_complete(), rather than calling the destructive
slot_free(). Since we now preserve the ZRAM_HUGE flag on written-back
slots (for the deferred decompression path), we also update slot_free()
to skip decrementing ->huge_pages if ZRAM_WB is set.
Link: https://lkml.kernel.org/r/20260320023143.2372879-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20260319034912.1894770-1-senozhatsky@chromium.org
Fixes: d38fab605c667 ("zram: introduce compressed data writeback")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Richard Chang <richardycc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
One major usage of damon_call() is online DAMON parameters update. It is
done by calling damon_commit_ctx() inside the damon_call() callback
function. damon_commit_ctx() can fail for two reasons: 1) invalid
parameters and 2) internal memory allocation failures. In case of
failures, the damon_ctx that attempted to be updated (commit destination)
can be partially updated (or, corrupted from a perspective), and therefore
shouldn't be used anymore. The function only ensures the damon_ctx object
can safely deallocated using damon_destroy_ctx().
The API callers are, however, calling damon_commit_ctx() only after
asserting the parameters are valid, to avoid damon_commit_ctx() fails due
to invalid input parameters. But it can still theoretically fail if the
internal memory allocation fails. In the case, DAMON may run with the
partially updated damon_ctx. This can result in unexpected behaviors
including even NULL pointer dereference in case of damos_commit_dests()
failure [1]. Such allocation failure is arguably too small to fail, so
the real world impact would be rare. But, given the bad consequence, this
needs to be fixed.
Avoid such partially-committed (maybe-corrupted) damon_ctx use by saving
the damon_commit_ctx() failure on the damon_ctx object. For this,
introduce damon_ctx->maybe_corrupted field. damon_commit_ctx() sets it
when it is failed. kdamond_call() checks if the field is set after each
damon_call_control->fn() is executed. If it is set, ignore remaining
callback requests and return. All kdamond_call() callers including
kdamond_fn() also check the maybe_corrupted field right after
kdamond_call() invocations. If the field is set, break the kdamond_fn()
main loop so that DAMON sill doesn't use the context that might be
corrupted.
[sj@kernel.org: let kdamond_call() with cancel regardless of maybe_corrupted]
Link: https://lkml.kernel.org/r/20260320031553.2479-1-sj@kernel.org
Link: https://sashiko.dev/#/patchset/20260319145218.86197-1-sj%40kernel.org
Link: https://lkml.kernel.org/r/20260319145218.86197-1-sj@kernel.org
Link: https://lore.kernel.org/20260319043309.97966-1-sj@kernel.org [1]
Fixes: 3301f1861d34 ("mm/damon/sysfs: handle commit command using damon_call()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 542eda1a8329 ("mm/rmap: improve anon_vma_clone(),
unlink_anon_vmas() comments, add asserts") alters the way errors are
handled, but overlooked one important aspect of clean up.
When a VMA encounters an error state in anon_vma_clone() (that is, on
attempted allocation of anon_vma_chain objects), it cleans up partially
established state in cleanup_partial_anon_vmas(), before returning an
error.
However, this occurs prior to anon_vma->num_active_vmas being incremented,
and it also fails to clear the VMA's vma->anon_vma field, which remains in
place.
This is immediately an inconsistent state, because
anon_vma->num_active_vmas is supposed to track the number of VMAs whose
vma->anon_vma field references that anon_vma, and now that count is
off-by-negative-1 for each VMA for which this error state has occurred.
When VMAs are unlinked from this anon_vma, unlink_anon_vmas() will
eventually underflow anon_vma->num_active_vmas, which will trigger a
warning.
This will always eventually happen, as we unlink anon_vma's at process
teardown.
It could also cause maybe_reuse_anon_vma() to incorrectly permit the reuse
of an anon_vma which has active VMAs attached, which will lead to a
persistently invalid state.
The solution is to clear the VMA's anon_vma field when we clean up partial
state, as the fact we are doing so indicates clearly that the VMA is not
correctly integrated into the anon_vma tree and thus this field is
invalid.
Link: https://lkml.kernel.org/r/20260318122632.63404-1-ljs@kernel.org
Fixes: 542eda1a8329 ("mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts")
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/linux-mm/20260302151547.2389070-1-sashal@kernel.org/
Reported-by: Jiakai Xu <jiakaipeanut@gmail.com>
Closes: https://lore.kernel.org/linux-mm/CAFb8wJvRhatRD-9DVmr5v5pixTMPEr3UKjYBJjCd09OfH55CKg@mail.gmail.com/
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Tested-by: Jiakai Xu <jiakaipeanut@gmail.com>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In the WAKE_SYNC path of scx_select_cpu_dfl(), waker_node was computed
with cpu_to_node(), while node (for prev_cpu) was computed with
scx_cpu_node_if_enabled(). When scx_builtin_idle_per_node is disabled,
idle_cpumask(waker_node) is called with a real node ID even though
per-node idle tracking is disabled, resulting in undefined behavior.
Fix by using scx_cpu_node_if_enabled() for waker_node as well, ensuring
both variables are computed consistently.
Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Generalize driver_override in the driver core, providing a common
sysfs implementation and concurrency-safe accessors for bus
implementations
- Do not use driver_override as IRQ name in the hwmon axi-fan driver
- Remove an unnecessary driver_override check in sh platform_early
- Migrate the platform bus to use the generic driver_override
infrastructure, fixing a UAF condition caused by accessing the
driver_override field without proper locking in the platform_match()
callback
* tag 'driver-core-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
driver core: platform: use generic driver_override infrastructure
sh: platform_early: remove pdev->driver_override check
hwmon: axi-fan: don't use driver_override as IRQ name
docs: driver-model: document driver_override
driver core: generalize driver_override in struct device
|
|
The modify logic registers temporary ftrace_ops object (tmp_ops) to trigger
the slow path for all direct callers to be able to safely modify attached
addresses.
At the moment we use ops->func_hash for tmp_ops filter, which represents all
the systems attachments. It's faster to use just the passed hash filter, which
contains only the modified sites and is always a subset of the ops->func_hash.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Cc: Song Liu <song@kernel.org>
Link: https://patch.msgid.link/20260312123738.129926-1-jolsa@kernel.org
Fixes: e93672f770d7 ("ftrace: Add update_ftrace_direct_mod function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since the validation loop in rb_meta_validate_events() updates the same
cpu_buffer->head_page->entries, the other subbuf entries are not updated.
Fix to use head_page to update the entries field, since it is the cursor
in this loop.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Link: https://patch.msgid.link/177391153882.193994.17158784065013676533.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When the "copy_trace_marker" option is enabled for an instance, anything
written into /sys/kernel/tracing/trace_marker is also copied into that
instances buffer. When the option is set, that instance's trace_array
descriptor is added to the marker_copies link list. This list is protected
by RCU, as all iterations uses an RCU protected list traversal.
When the instance is deleted, all the flags that were enabled are cleared.
This also clears the copy_trace_marker flag and removes the trace_array
descriptor from the list.
The issue is after the flags are called, a direct call to
update_marker_trace() is performed to clear the flag. This function
returns true if the state of the flag changed and false otherwise. If it
returns true here, synchronize_rcu() is called to make sure all readers
see that its removed from the list.
But since the flag was already cleared, the state does not change and the
synchronization is never called, leaving a possible UAF bug.
Move the clearing of all flags below the updating of the copy_trace_marker
option which then makes sure the synchronization is performed.
Also use the flag for checking the state in update_marker_trace() instead
of looking at if the list is empty.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260318185512.1b6c7db4@gandalf.local.home
Fixes: 7b382efd5e8a ("tracing: Allow the top level trace_marker to write into another instances")
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/all/20260225133122.237275-1-sashal@kernel.org/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The system call trace events call trace_user_fault_read() to read the user
space part of some system calls. This is done by grabbing a per-cpu
buffer, disabling migration, enabling preemption, calling
copy_from_user(), disabling preemption, enabling migration and checking if
the task was preempted while preemption was enabled. If it was, the buffer
is considered corrupted and it tries again.
There's a safety mechanism that will fail out of this loop if it fails 100
times (with a warning). That warning message was triggered in some
pi_futex stress tests. Enabling the sched_switch trace event and
traceoff_on_warning, showed the problem:
pi_mutex_hammer-1375 [006] d..21 138.981648: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981651: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981656: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981659: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981664: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981667: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981671: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981675: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981679: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981682: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981687: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981690: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981695: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981698: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981703: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981706: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981711: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981714: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981719: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981722: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981727: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981730: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981735: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981738: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
What happened was the task 1375 was flagged to be migrated. When
preemption was enabled, the migration thread woke up to migrate that task,
but failed because migration for that task was disabled. This caused the
loop to fail to exit because the task scheduled out while trying to read
user space.
Every time the task enabled preemption the migration thread would schedule
in, try to migrate the task, fail and let the task continue. But because
the loop would only enable preemption with migration disabled, it would
always fail because each time it enabled preemption to read user space,
the migration thread would try to migrate it.
To solve this, when the loop fails to read user space without being
scheduled out, enabled and disable preemption with migration enabled. This
will allow the migration task to successfully migrate the task and the
next loop should succeed to read user space without being scheduled out.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260316130734.1858a998@gandalf.local.home
Fixes: 64cf7d058a005 ("tracing: Have trace_marker use per-cpu data to read user space")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
This reverts commit e3f6a42272e028c46695acc83fc7d7c42f2750ad.
The commit says that the tracepoint only deals with the current task,
however the following case is not current task:
comm_write() {
p = get_proc_task(inode);
if (!p)
return -ESRCH;
if (same_thread_group(current, p))
set_task_comm(p, buffer);
}
where set_task_comm() calls __set_task_comm() which records
the update of p and not current.
So revert the patch to show pid.
Cc: <mhiramat@kernel.org>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <elver@google.com>
Cc: <kees@kernel.org>
Link: https://patch.msgid.link/20260306075954.4533-1-xuewen.yan@unisoc.com
Fixes: e3f6a42272e0 ("tracing: Remove pid in task_rename tracing output")
Reported-by: Guohua Yan <guohua.yan@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add multiple test cases for linked register tracking with alu32 ops:
- Add a test that checks sync_linked_regs() regarding reg->id (the linked
target register) for BPF_ADD_CONST32 rather than known_reg->id (the
branch register).
- Add a test case for linked register tracking that exposes the cross-type
sync_linked_regs() bug. One register uses alu32 (w7 += 1, BPF_ADD_CONST32)
and another uses alu64 (r8 += 2, BPF_ADD_CONST64), both linked to the
same base register.
- Add a test case that exercises regsafe() path pruning when two execution
paths reach the same program point with linked registers carrying
different ADD_CONST flags (BPF_ADD_CONST32 from alu32 vs BPF_ADD_CONST64
from alu64). This particular test passes with and without the fix since
the pruning will fail due to different ranges, but it would still be
useful to carry this one as a regression test for the unreachable div
by zero.
With the fix applied all the tests pass:
# LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_linked_scalars
[...]
./test_progs -t verifier_linked_scalars
#602/1 verifier_linked_scalars/scalars: find linked scalars:OK
#602/2 verifier_linked_scalars/sync_linked_regs_preserves_id:OK
#602/3 verifier_linked_scalars/scalars_neg:OK
#602/4 verifier_linked_scalars/scalars_neg_sub:OK
#602/5 verifier_linked_scalars/scalars_neg_alu32_add:OK
#602/6 verifier_linked_scalars/scalars_neg_alu32_sub:OK
#602/7 verifier_linked_scalars/scalars_pos:OK
#602/8 verifier_linked_scalars/scalars_sub_neg_imm:OK
#602/9 verifier_linked_scalars/scalars_double_add:OK
#602/10 verifier_linked_scalars/scalars_sync_delta_overflow:OK
#602/11 verifier_linked_scalars/scalars_sync_delta_overflow_large_range:OK
#602/12 verifier_linked_scalars/scalars_alu32_big_offset:OK
#602/13 verifier_linked_scalars/scalars_alu32_basic:OK
#602/14 verifier_linked_scalars/scalars_alu32_wrap:OK
#602/15 verifier_linked_scalars/scalars_alu32_zext_linked_reg:OK
#602/16 verifier_linked_scalars/scalars_alu32_alu64_cross_type:OK
#602/17 verifier_linked_scalars/scalars_alu32_alu64_regsafe_pruning:OK
#602/18 verifier_linked_scalars/alu32_negative_offset:OK
#602/19 verifier_linked_scalars/spurious_precision_marks:OK
#602 verifier_linked_scalars:OK
Summary: 1/19 PASSED, 0 SKIPPED, 0 FAILED
Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260319211507.213816-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Jenny reported that in sync_linked_regs() the BPF_ADD_CONST32 flag is
checked on known_reg (the register narrowed by a conditional branch)
instead of reg (the linked target register created by an alu32 operation).
Example case with reg:
1. r6 = bpf_get_prandom_u32()
2. r7 = r6 (linked, same id)
3. w7 += 5 (alu32 -- r7 gets BPF_ADD_CONST32, zero-extended by CPU)
4. if w6 < 0xFFFFFFFC goto safe (narrows r6 to [0xFFFFFFFC, 0xFFFFFFFF])
5. sync_linked_regs() propagates to r7 but does NOT call zext_32_to_64()
6. Verifier thinks r7 is [0x100000001, 0x100000004] instead of [1, 4]
Since known_reg above does not have BPF_ADD_CONST32 set above, zext_32_to_64()
is never called on alu32-derived linked registers. This causes the verifier
to track incorrect 64-bit bounds, while the CPU correctly zero-extends the
32-bit result.
The code checking known_reg->id was correct however (see scalars_alu32_wrap
selftest case), but the real fix needs to handle both directions - zext
propagation should be done when either register has BPF_ADD_CONST32, since
the linked relationship involves a 32-bit operation regardless of which
side has the flag.
Example case with known_reg (exercised also by scalars_alu32_wrap):
1. r1 = r0; w1 += 0x100 (alu32 -- r1 gets BPF_ADD_CONST32)
2. if r1 > 0x80 - known_reg = r1 (has BPF_ADD_CONST32), reg = r0 (doesn't)
Hence, fix it by checking for (reg->id | known_reg->id) & BPF_ADD_CONST32.
Moreover, sync_linked_regs() also has a soundness issue when two linked
registers used different ALU widths: one with BPF_ADD_CONST32 and the
other with BPF_ADD_CONST64. The delta relationship between linked registers
assumes the same arithmetic width though. When one register went through
alu32 (CPU zero-extends the 32-bit result) and the other went through
alu64 (no zero-extension), the propagation produces incorrect bounds.
Example:
r6 = bpf_get_prandom_u32() // fully unknown
if r6 >= 0x100000000 goto out // constrain r6 to [0, U32_MAX]
r7 = r6
w7 += 1 // alu32: r7.id = N | BPF_ADD_CONST32
r8 = r6
r8 += 2 // alu64: r8.id = N | BPF_ADD_CONST64
if r7 < 0xFFFFFFFF goto out // narrows r7 to [0xFFFFFFFF, 0xFFFFFFFF]
At the branch on r7, sync_linked_regs() runs with known_reg=r7
(BPF_ADD_CONST32) and reg=r8 (BPF_ADD_CONST64). The delta path
computes:
r8 = r7 + (delta_r8 - delta_r7) = 0xFFFFFFFF + (2 - 1) = 0x100000000
Then, because known_reg->id has BPF_ADD_CONST32, zext_32_to_64(r8) is
called, truncating r8 to [0, 0]. But r8 used a 64-bit ALU op -- the
CPU does NOT zero-extend it. The actual CPU value of r8 is
0xFFFFFFFE + 2 = 0x100000000, not 0. The verifier now underestimates
r8's 64-bit bounds, which is a soundness violation.
Fix sync_linked_regs() by skipping propagation when the two registers
have mixed ALU widths (one BPF_ADD_CONST32, the other BPF_ADD_CONST64).
Lastly, fix regsafe() used for path pruning: the existing checks used
"& BPF_ADD_CONST" to test for offset linkage, which treated
BPF_ADD_CONST32 and BPF_ADD_CONST64 as equivalent.
Fixes: 7a433e519364 ("bpf: Support negative offsets, BPF_SUB, and alu32 for linked register tracking")
Reported-by: Jenny Guanni Qu <qguanni@gmail.com>
Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260319211507.213816-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|