| Age | Commit message (Collapse) | Author |
|
fsl_xcvr_activate_ctl() has
lockdep_assert_held(&card->snd_card->controls_rwsem),
but fsl_xcvr_mode_put() calls it without acquiring this lock.
Other callers of fsl_xcvr_activate_ctl() in fsl_xcvr_startup() and
fsl_xcvr_shutdown() properly acquire the lock with down_read()/up_read().
Add the missing down_read()/up_read() calls around fsl_xcvr_activate_ctl()
in fsl_xcvr_mode_put() to fix the lockdep assertion and prevent potential
race conditions when multiple userspace threads access the control.
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Link: https://patch.msgid.link/20260202174112.2018402-1-n7l8m4@u.northwestern.edu
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add compatible string ti,tlv320aic23 to fix below CHECK_DTB warning:
arch/arm/boot/dts/nxp/imx/imx35-eukrea-mbimxsd35-baseboard.dtb:
/soc/bus@43f00000/i2c@43f80000/codec@1a: failed to match any schema with compatible: ['ti,tlv320aic23']
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Link: https://patch.msgid.link/20260202205758.3044617-1-Frank.Li@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Fixes: 4a767b1d039a8 ("ASoC: amd: add acp3x pdm driver dma ops")
Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Link: https://patch.msgid.link/20260202205034.7697-1-chris.bainbridge@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The FTIDE010 has been missing some timing settings since its
inception, since the upstream OpenWrt patch was missing these.
The community has since come up with the appropriate timings.
Fixes: be4e456ed3a5 ("ata: Add driver for Faraday Technology FTIDE010")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
During the investigation of the various transition mode issues
instrumentation revealed that the amount of bitmap operations can be
significantly reduced when a task with a transitional CID schedules out
after the fixup function completed and disabled the transition mode.
At that point the mode is stable and therefore it is not required to drop
the transitional CID back into the pool. As the fixup is complete the
potential exhaustion of the CID pool is not longer possible, so the CID can
be transferred to the scheduling out task or to the CPU depending on the
current ownership mode.
The racy snapshot of mm_cid::mode which contains both the ownership state
and the transition bit is valid because runqueue lock is held and the fixup
function of a concurrent mode switch is serialized.
Assigning the ownership right there not only spares the bitmap access for
dropping the CID it also avoids it when the task is scheduled back in as it
directly hits the fast path in both modes when the CID is within the
optimal range. If it's outside the range the next schedule in will need to
converge so dropping it right away is sensible. In the good case this also
allows to go into the fast path on the next schedule in operation.
With a thread pool benchmark which is configured to cross the mode switch
boundaries frequently this reduces the number of bitmap operations by about
30% and increases the fastpath utilization in the low single digit
percentage range.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260201192835.100194627@kernel.org
|
|
When a exiting task initiates the switch from per CPU back to per task
mode, it has already dropped its CID and marked itself inactive. But a
leftover from an earlier iteration of the rework then reassigns the per
CPU CID to the exiting task with the transition bit set.
That's wrong as the task is already marked CID inactive, which means it is
inconsistent state. It's harmless because the CID is marked in transit and
therefore dropped back into the pool when the exiting task schedules out
either through preemption or the final schedule().
Simply drop the per CPU CID when the exiting task triggered the transition.
Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260201192835.032221009@kernel.org
|
|
Shrikanth reported a hard lockup which he observed once. The stack trace
shows the following CID related participants:
watchdog: CPU 23 self-detected hard LOCKUP @ mm_get_cid+0xe8/0x188
NIP: mm_get_cid+0xe8/0x188
LR: mm_get_cid+0x108/0x188
mm_cid_switch_to+0x3c4/0x52c
__schedule+0x47c/0x700
schedule_idle+0x3c/0x64
do_idle+0x160/0x1b0
cpu_startup_entry+0x48/0x50
start_secondary+0x284/0x288
start_secondary_prolog+0x10/0x14
watchdog: CPU 11 self-detected hard LOCKUP @ plpar_hcall_norets_notrace+0x18/0x2c
NIP: plpar_hcall_norets_notrace+0x18/0x2c
LR: queued_spin_lock_slowpath+0xd88/0x15d0
_raw_spin_lock+0x80/0xa0
raw_spin_rq_lock_nested+0x3c/0xf8
mm_cid_fixup_cpus_to_tasks+0xc8/0x28c
sched_mm_cid_exit+0x108/0x22c
do_exit+0xf4/0x5d0
make_task_dead+0x0/0x178
system_call_exception+0x128/0x390
system_call_vectored_common+0x15c/0x2ec
The task on CPU11 is running the CID ownership mode change fixup function
and is stuck on a runqueue lock. The task on CPU23 is trying to get a CID
from the pool with the same runqueue lock held, but the pool is empty.
After decoding a similar issue in the opposite direction switching from per
task to per CPU mode the tool which models the possible scenarios failed to
come up with a similar loop hole.
This showed up only once, was not reproducible and according to tooling not
related to a overlooked scheduling scenario permutation. But the fact that
it was observed on a PowerPC system gave the right hint: PowerPC is a
weakly ordered architecture.
The transition mechanism does:
WRITE_ONCE(mm->mm_cid.transit, MM_CID_TRANSIT);
WRITE_ONCE(mm->mm_cid.percpu, new_mode);
fixup()
WRITE_ONCE(mm->mm_cid.transit, 0);
mm_cid_schedin() does:
if (!READ_ONCE(mm->mm_cid.percpu))
...
cid |= READ_ONCE(mm->mm_cid.transit);
so weakly ordered systems can observe percpu == false and transit == 0 even
if the fixup function has not yet completed. As a consequence the task will
not drop the CID when scheduling out before the fixup is completed, which
means the CID space can be exhausted and the next task scheduling in will
loop in mm_get_cid() and the fixup thread can livelock on the held runqueue
lock as above.
This could obviously be solved by using:
smp_store_release(&mm->mm_cid.percpu, true);
and
smp_load_acquire(&mm->mm_cid.percpu);
but that brings a memory barrier back into the scheduler hotpath, which was
just designed out by the CID rewrite.
That can be completely avoided by combining the per CPU mode and the
transit storage into a single mm_cid::mode member and ordering the stores
against the fixup functions to prevent the CPU from reordering them.
That makes the update of both states atomic and a concurrent read observes
always consistent state.
The price is an additional AND operation in mm_cid_schedin() to evaluate
the per CPU or the per task path, but that's in the noise even on strongly
ordered architectures as the actual load can be significantly more
expensive and the conditional branch evaluation is there anyway.
Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Closes: https://lore.kernel.org/bdfea828-4585-40e8-8835-247c6a8a76b0@linux.ibm.com
Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260201192834.965217106@kernel.org
|
|
Ihor reported a BPF CI failure which turned out to be a live lock in the
MM_CID management. The scenario is:
A test program creates the 5th thread, which means the MM_CID users become
more than the number of CPUs (four in this example), so it switches to per
CPU ownership mode.
At this point each live task of the program has a CID associated. Assume
thread creation order assignment for simplicity.
T0 CID0 runs fork() and creates T4
T1 CID1
T2 CID2
T3 CID3
T4 --- not visible yet
T0 sets mm_cid::percpu = true and transfers its own CID to CPU0 where it
runs on and then starts the fixup which walks through the threads to
transfer the per task CIDs either to the CPU the task is running on or drop
it back into the pool if the task is not on a CPU.
During that T1 - T3 are free to schedule in and out before the fixup caught
up with them. Going through all possible permutations with a python script
revealed a few problematic cases. The most trivial one is:
T1 schedules in on CPU1 and observes percpu == true, so it transfers
its CID to CPU1
T1 is migrated to CPU2 and schedule in observes percpu == true, but
CPU2 does not have a CID associated and T1 transferred its own to
CPU1
So it has to allocate one with CPU2 runqueue lock held, but the
pool is empty, so it keeps looping in mm_get_cid().
Now T0 reaches T1 in the thread walk and tries to lock the corresponding
runqueue lock, which is held causing a full live lock.
There is a similar scenario in the reverse direction of switching from per
CPU to task mode which is way more obvious and got therefore addressed by
an intermediate mode. In this mode the CIDs are marked with MM_CID_TRANSIT,
which means that they are neither owned by the CPU nor by the task. When a
task schedules out with a transit CID it drops the CID back into the pool
making it available for others to use temporarily. Once the task which
initiated the mode switch finished the fixup it clears the transit mode and
the process goes back into per task ownership mode.
Unfortunately this insight was not mapped back to the task to CPU mode
switch as the above described scenario was not considered in the analysis.
Apply the same transit mechanism to the task to CPU mode switch to handle
these problematic cases correctly.
As with the CPU to task transition this results in a potential temporary
contention on the CID bitmap, but that's only for the time it takes to
complete the transition. After that it stays in steady mode which does not
touch the bitmap at all.
Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Closes: https://lore.kernel.org/2b7463d7-0f58-4e34-9775-6e2115cfb971@linux.dev
Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260201192834.897115238@kernel.org
|
|
Merge series from Bard Liao <yung-chuan.liao@linux.intel.com>:
Currently, hda_sdw_bpt_dma_prepare() get a HDA stream and use the link
DMA but doesn't reserve it. It works fine because we assume the
SwoundWire BPT will not run with audio streams simultaneously. Create
and use the new helpers to reserve the link DMA and allow running BPT
and audio stream simultaneously.
Pierre adds:
For the record this solution has two issues not documented in any commit
message:
a) this will not work in 'dspless' mode, where the link DMA is not
enabled. That's probably fine given that no one used that mode in
production, but that's a software restriction that you will not be able
to undo.
b) this raise the question of how bandwidth will be managed. The premise
of BPT is that it uses all the bus bandwidth to guarantee predictable
firmware download times. If the available bandwidth is restricted by
other audio streams, then mechanically the startup latency will be
increased and vary - or you will have to run the bus at a higher
frequency to provision enough bandwidth for BPT but that means higher
power consumption. Or you will have to change the bus clock dynamically
which is possible at the hardware level for SDCA parts but not legacy
ones.
I am not going to lay on the tracks for this low-level set of changes,
but you'll have to address the b) opens for future contributions.
|
|
Add support for Samsung's S2MPG11 PMIC, which is a Power Management IC
for mobile applications with buck converters, various LDOs, power
meters, NTC thermistor inputs, and additional GPIO interfaces. It
typically complements an S2MPG10 PMIC in a main/sub configuration as
the sub-PMIC.
Like S2MPG10, communication is not via I2C, but via the Samsung ACPM
firmware.
While at it, we can also switch to asynchronous probe, which helps with
probe performance, as the drivers for s2mpg10 and s2mpg11 can probe in
parallel.
Note: The firmware uses the ACPM channel ID and the Speedy channel ID
to select the PMIC address. Since these are firmware properties, they
can not be retrieved from DT, but instead are deducted from the
compatible for now.
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20260122-s2mpg1x-regulators-v7-9-3b1f9831fffd@linaro.org
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
Bucks can reasonably be supplies for LDOs, but not the other way
around. Since rail registration is going to be ordered by 'enum
s2mpg10_regulators', it makes sense to specify bucks first, so that
during LDO registration it is more likely that the corresponding supply
is known already.
This can improve probe speed, as no unnecessary deferrals and retries
are required anymore.
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20260122-s2mpg1x-regulators-v7-8-3b1f9831fffd@linaro.org
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
The Samsung S2MPG11 PMIC is similar to the existing S2MPG10 PMIC
supported by this binding, but still differs enough from it to justify
a separate binding.
It is a Power Management IC for mobile applications with buck
converters, various LDOs, power meters, NTC thermistor inputs, and
additional GPIO interfaces and typically complements an S2MPG10 PMIC in
a main/sub configuration as the sub-PMIC.
Like S2MPG10, communication is via the Samsung ACPM firmware and it
therefore needs to be a child of the ACPM firmware node.
Add the PMIC, the regulators node, and the supply inputs of the
regulator rails, with the supply names matching the datasheet.
Note: S2MPG11 is typically used as the sub-PMIC together with an
S2MPG10 PMIC in a main/sub configuration, hence the datasheet and the
binding both suffix the supplies with an 's'.
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20260122-s2mpg1x-regulators-v7-6-3b1f9831fffd@linaro.org
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
Update the regulators node to link to the correct and expected
samsung,s2mpg10-regulators binding, in order to describe the regulators
available on this PMIC.
Additionally, describe the supply inputs of the regulator rails, with
the supply names matching the datasheet.
While at it, update the description and example slightly.
Note: S2MPG10 is typically used as the main-PMIC together with an
S2MPG11 PMIC in a main/sub configuration, hence the datasheet and the
binding both suffix the supplies with an 'm'.
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20260122-s2mpg1x-regulators-v7-5-3b1f9831fffd@linaro.org
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
The samsung,s2mpg10-pmic binding is going to acquire various additional
properties. To avoid making the common samsung,s2mps11 binding file too
complicated due to additional nesting, split s2mpg10 out into its own
file.
As a side-effect, the oneOf for the interrupts is not required anymore,
as the required: node is at the top-level now.
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20260122-s2mpg1x-regulators-v7-4-3b1f9831fffd@linaro.org
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
pwrseq_power_on() calls pwrseq_unit_disable() when the
post_enable callback fails. However, this call is outside the
scoped_guard(mutex, &pwrseq->state_lock) block that ends.
pwrseq_unit_disable() has lockdep_assert_held(&pwrseq->state_lock),
which will fail when called from this error path.
Add the scoped_guard block to cover the post_enable callback and its
error handling to ensure the lock is held when pwrseq_unit_disable() is
called.
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Link: https://patch.msgid.link/20260130182651.1576579-1-n7l8m4@u.northwestern.edu
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
|
|
meaning in PLIC
In PLIC, interrupt source 0 is reserved and should not be used.
Therefore, the valid interrupt sources are from 1 to riscv,ndev
inclusive.
Update the documentation to clarify this point.
[ tglx: Fixup subject prefix ]
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/tencent_720A4669773B1EE15EC720869C35C2F0490A@qq.com
|
|
Since commit 9880702d123f2 ("ACPI: property: Support using strings in
reference properties") it is possible to use strings instead of local
references. This work fine with single GPIO but not with arrays as
acpi_gpio_package_count() didn't handle this case. Update it to handle
strings like local references to cover this case as well.
Signed-off-by: Alban Bedel <alban.bedel@lht.dlh.de>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://patch.msgid.link/20260129145944.3372777-1-alban.bedel@lht.dlh.de
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
|
|
The driver is handling the number of hardware interrupts inconsistently.
The reason is that the firmware enumerates the maximum number of device
interrupts, but the actual number of hardware interrupts is one more
because hardware interrupt 0 is reserved.
There are two loop variants where this matters:
1) Iterating over the device interrupts
for (irq = 1; irq < total_irqs; irq++)
2) Iterating over the number of interrupt register groups
for (grp = 0; grp < irq_groups; grp++)
The current code stores the number of device interrupts and that requires
to write the loops as:
1) for (irq = 1; irq <= device_irqs; irq++)
2) for (grp = 0; grp < DIV_ROUND_UP(device_irqs + 1); grp++)
But the code gets it wrong all over the place. Just fixing up the
conditions and off by ones is not a sustainable solution as the next changes
will reintroduce the same bugs over and over.
Sanitize it by storing the total number of hardware interrupts during probe
and precalculating the number of groups. To future proof it mark
priv::total_irqs __private, provide a correct iterator macro and adjust the
code to this.
Marking it private allows sparse (C=1 build) to catch direct access to this
member:
drivers/irqchip/irq-sifive-plic.c:270:9: warning: dereference of noderef expression
That should prevent at least the most obvious future damage in that area.
Fixes: e80f0b6a2cf3 ("irqchip/irq-sifive-plic: Add syscore callbacks for hibernation")
Reported-by: Yangyu Chen <cyy@cyyself.name>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Yangyu Chen <cyy@cyyself.name>
Link: https://patch.msgid.link/87ikcd36i9.ffs@tglx
|
|
While SLAB_OBJ_EXT_IN_OBJ allows to reduce memory overhead to account
slab objects, it prevents slab merging because merging can change
the metadata layout.
As pointed out Vlastimil Babka, disabling merging solely for this memory
optimization may not be a net win, because disabling slab merging tends
to increase overall memory usage.
Restrict SLAB_OBJ_EXT_IN_OBJ to caches that are already unmergeable for
other reasons (e.g., those with constructors or SLAB_TYPESAFE_BY_RCU).
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260127103151.21883-3-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When a cache has high s->align value and s->object_size is not aligned
to it, each object ends up with some unused space because of alignment.
If this wasted space is big enough, we can use it to store the
slabobj_ext metadata instead of wasting it.
On my system, this happens with caches like kmem_cache, mm_struct, pid,
task_struct, sighand_cache, xfs_inode, and others.
To place the slabobj_ext metadata within each object, the existing
slab_obj_ext() logic can still be used by setting:
- slab->obj_exts = slab_address(slab) + (slabobj_ext offset)
- stride = s->size
slab_obj_ext() doesn't need know where the metadata is stored,
so this method works without adding extra overhead to slab_obj_ext().
A good example benefiting from this optimization is xfs_inode
(object_size: 992, align: 64). To measure memory savings, 2 millions of
files were created on XFS.
[ MEMCG=y, MEM_ALLOC_PROFILING=n ]
Before patch (creating ~2.64M directories on xfs):
Slab: 5175976 kB
SReclaimable: 3837524 kB
SUnreclaim: 1338452 kB
After patch (creating ~2.64M directories on xfs):
Slab: 5152912 kB
SReclaimable: 3838568 kB
SUnreclaim: 1314344 kB (-23.54 MiB)
Enjoy the memory savings!
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-10-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
To access SLUB's internal implementation details beyond cache flags in
ksize(), move __ksize(), ksize(), and slab_ksize() to mm/slub.c.
[vbabka@suse.cz: also make __ksize() static and move its kerneldoc to
ksize() ]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-9-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The leftover space in a slab is always smaller than s->size, and
kmem caches for large objects that are not power-of-two sizes tend to have
a greater amount of leftover space per slab. In some cases, the leftover
space is larger than the size of the slabobj_ext array for the slab.
An excellent example of such a cache is ext4_inode_cache. On my system,
the object size is 1136, with a preferred order of 3, 28 objects per slab,
and 960 bytes of leftover space per slab.
Since the size of the slabobj_ext array is only 224 bytes (w/o mem
profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
fits within the leftover space.
Allocate the slabobj_exts array from this unused space instead of using
kcalloc() when it is large enough. The array is allocated from unused
space only when creating new slabs, and it doesn't try to utilize unused
space if alloc_slab_obj_exts() is called after slab creation because
implementing lazy allocation involves more expensive synchronization.
The implementation and evaluation of lazy allocation from unused space
is left as future-work. As pointed by Vlastimil Babka [1], it could be
beneficial when a slab cache without SLAB_ACCOUNT can be created, and
some of the allocations from the cache use __GFP_ACCOUNT. For example,
xarray does that.
To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
array only when either of them is enabled on slab allocation.
[ MEMCG=y, MEM_ALLOC_PROFILING=n ]
Before patch (creating ~2.64M directories on ext4):
Slab: 4747880 kB
SReclaimable: 4169652 kB
SUnreclaim: 578228 kB
After patch (creating ~2.64M directories on ext4):
Slab: 4724020 kB
SReclaimable: 4169188 kB
SUnreclaim: 554832 kB (-22.84 MiB)
Enjoy the memory savings!
Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz [1]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-8-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
In the near future, slabobj_ext may reside outside the allocated slab
object range within a slab, which could be reported as an out-of-bounds
access by KASAN.
As suggested by Andrey Konovalov [1], explicitly disable KASAN and KMSAN
checks when accessing slabobj_ext within slab allocator, memory profiling,
and memory cgroup code. While an alternative approach could be to unpoison
slabobj_ext, out-of-bounds accesses outside the slab allocator are
generally more common.
Move metadata_access_enable()/disable() helpers to mm/slab.h so that
it can be used outside mm/slub.c. However, as suggested by Suren
Baghdasaryan [2], instead of calling them directly from mm code (which is
more prone to errors), change users to access slabobj_ext via get/put
APIs:
- Users should call get_slab_obj_exts() to access slabobj_metadata
and call put_slab_obj_exts() when it's done.
- From now on, accessing it outside the section covered by
get_slab_obj_exts() ~ put_slab_obj_exts() is illegal.
This ensures that accesses to slabobj_ext metadata won't be reported
as access violations.
Call kasan_reset_tag() in slab_obj_ext() before returning the address to
prevent SW or HW tag-based KASAN from reporting false positives.
Suggested-by: Andrey Konovalov <andreyknvl@gmail.com>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Link: https://lore.kernel.org/linux-mm/CA+fCnZezoWn40BaS3cgmCeLwjT+5AndzcQLc=wH3BjMCu6_YCw@mail.gmail.com [1]
Link: https://lore.kernel.org/linux-mm/CAJuCfpG=Lb4WhYuPkSpdNO4Ehtjm1YcEEK0OM=3g9i=LxmpHSQ@mail.gmail.com [2]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-7-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Use a configurable stride value when accessing slab object extension
metadata instead of assuming a fixed sizeof(struct slabobj_ext).
Store stride value in free bits of slab->counters field. This allows
for flexibility in cases where the extension is embedded within
slab objects.
Since these free bits exist only on 64-bit, any future optimizations
that need to change stride value cannot be enabled on 32-bit architectures.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-6-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Currently, the slab allocator assumes that slab->obj_exts is a pointer
to an array of struct slabobj_ext objects. However, to support storage
methods where struct slabobj_ext is embedded within objects, the slab
allocator should not make this assumption. Instead of directly
dereferencing the slabobj_exts array, abstract access to
struct slabobj_ext via helper functions.
Introduce a new API slabobj_ext metadata access:
slab_obj_ext(slab, obj_exts, index) - returns the pointer to
struct slabobj_ext element at the given index.
Directly dereferencing the return value of slab_obj_exts() is no longer
allowed. Instead, slab_obj_ext() must always be used to access
individual struct slabobj_ext objects.
Convert all users to use these APIs.
No functional changes intended.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-5-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Convert ext4_inode_cache to use the kmem_cache_args interface and
specify a free pointer offset.
Since ext4_inode_cache uses a constructor, the free pointer would be
placed after the object to prevent overwriting fields used by the
constructor.
However, some fields such as ->i_flags are not used by the constructor
and can safely be repurposed for the free pointer.
Specify the free pointer offset at i_flags to reduce the object size.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-4-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When a slab cache has a constructor, the free pointer is placed after the
object because certain fields must not be overwritten even after the
object is freed.
However, some fields that the constructor does not initialize can safely
be overwritten after free. Allow specifying the free pointer offset within
the object, reducing the overall object size when some fields can be reused
for the free pointer.
Adjust the document accordingly.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260113061845.159790-3-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When both KASAN and SLAB_STORE_USER are enabled, accesses to
struct kasan_alloc_meta fields can be misaligned on 64-bit architectures.
This occurs because orig_size is currently defined as unsigned int,
which only guarantees 4-byte alignment. When struct kasan_alloc_meta is
placed after orig_size, it may end up at a 4-byte boundary rather than
the required 8-byte boundary on 64-bit systems.
Note that 64-bit architectures without HAVE_EFFICIENT_UNALIGNED_ACCESS
are assumed to require 64-bit accesses to be 64-bit aligned.
See HAVE_64BIT_ALIGNED_ACCESS and commit adab66b71abf ("Revert:
"ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"") for more details.
Change orig_size from unsigned int to unsigned long to ensure proper
alignment for any subsequent metadata. This should not waste additional
memory because kmalloc objects are already aligned to at least
ARCH_KMALLOC_MINALIGN.
Closes: https://lore.kernel.org/all/aPrLF0OUK651M4dk@hyeyoo
Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Closes: https://lore.kernel.org/all/aPrLF0OUK651M4dk@hyeyoo/
Link: https://patch.msgid.link/20260113061845.159790-2-harry.yoo@oracle.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The comments above check_pad_bytes() document the field layout of a
single object. Rewrite them to improve clarity and precision.
Also update an outdated comment in calculate_sizes().
Suggested-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Hao Li <hao.li@linux.dev>
Link: https://patch.msgid.link/20251229122415.192377-1-hao.li@linux.dev
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
When allocating slabobj_ext array in alloc_slab_obj_exts(), the array
can be allocated from the same slab we're allocating the array for.
This led to obj_exts_in_slab() incorrectly returning true [1],
although the array is not allocated from wasted space of the slab.
Vlastimil Babka observed that this problem should be fixed even when
ignoring its incompatibility with obj_exts_in_slab(), because it creates
slabs that are never freed as there is always at least one allocated
object.
To avoid this, use the next kmalloc size or large kmalloc when
the array can be allocated from the same cache we're allocating
the array for.
In case of random kmalloc caches, there are multiple kmalloc caches
for the same size and the cache is selected based on the caller address.
Because it is fragile to ensure the same caller address is passed to
kmalloc_slab(), kmalloc_noprof(), and kmalloc_node_noprof(), bump the
size to (s->object_size + 1) when the sizes are equal, instead of
directly comparing the kmem_cache pointers.
Note that this doesn't happen when memory allocation profiling is
disabled, as when the allocation of the array is triggered by memory
cgroup (KMALLOC_CGROUP), the array is allocated from KMALLOC_NORMAL.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202601231457.f7b31e09-lkp@intel.com [1]
Cc: stable@vger.kernel.org
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20260126125714.88008-1-harry.yoo@oracle.com
Reviewed-by: Hao Li <hao.li@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The revocable code is still under active discussion, and there is no
in-kernel users of it. So disable it from the build for now so that no
one suffers from it being present in the tree, yet leave it in the
source tree so that others can easily test it by reverting this commit
and building off of it for future releases.
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/2026020307-rimmed-dreamy-5a67@gregkh
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The mgag200_bmc_stop_scanout() function is called by the .atomic_disable()
handler for the MGA G200 VGA BMC encoder. This function performs a few
register writes to inform the BMC of an upcoming mode change, and then
polls to wait until the BMC actually stops.
The polling is implemented using a busy loop with udelay() and an iteration
timeout of 300, resulting in the function blocking for 300 milliseconds.
The function gets called ultimately by the output_poll_execute work thread
for the DRM output change polling thread of the mgag200 driver:
kworker/0:0-mm_ 3528 [000] 4555.315364:
ffffffffaa0e25b3 delay_halt.part.0+0x33
ffffffffc03f6188 mgag200_bmc_stop_scanout+0x178
ffffffffc087ae7a disable_outputs+0x12a
ffffffffc087c12a drm_atomic_helper_commit_tail+0x1a
ffffffffc03fa7b6 mgag200_mode_config_helper_atomic_commit_tail+0x26
ffffffffc087c9c1 commit_tail+0x91
ffffffffc087d51b drm_atomic_helper_commit+0x11b
ffffffffc0509694 drm_atomic_commit+0xa4
ffffffffc05105e8 drm_client_modeset_commit_atomic+0x1e8
ffffffffc0510ce6 drm_client_modeset_commit_locked+0x56
ffffffffc0510e24 drm_client_modeset_commit+0x24
ffffffffc088a743 __drm_fb_helper_restore_fbdev_mode_unlocked+0x93
ffffffffc088a683 drm_fb_helper_hotplug_event+0xe3
ffffffffc050f8aa drm_client_dev_hotplug+0x9a
ffffffffc088555a output_poll_execute+0x29a
ffffffffa9b35924 process_one_work+0x194
ffffffffa9b364ee worker_thread+0x2fe
ffffffffa9b3ecad kthread+0xdd
ffffffffa9a08549 ret_from_fork+0x29
On a server running ptp4l with the mgag200 driver loaded, we found that
ptp4l would sometimes get blocked from execution because of this busy
waiting loop.
Every so often, approximately once every 20 minutes -- though with large
variance -- the output_poll_execute() thread would detect some sort of
change that required performing a hotplug event which results in attempting
to stop the BMC scanout, resulting in a 300msec delay on one CPU.
On this system, ptp4l was pinned to a single CPU. When the
output_poll_execute() thread ran on that CPU, it blocked ptp4l from
executing for its 300 millisecond duration.
This resulted in PTP service disruptions such as failure to send a SYNC
message on time, failure to handle ANNOUNCE messages on time, and clock
check warnings from the application. All of this despite the application
being configured with FIFO_RT and a higher priority than the background
workqueue tasks. (However, note that the kernel did not use
CONFIG_PREEMPT...)
It is unclear if the event is due to a faulty VGA connection, another bug,
or actual events causing a change in the connection. At least on the system
under test it is not a one-time event and consistently causes disruption to
the time sensitive applications.
The function has some helpful comments explaining what steps it is
attempting to take. In particular, step 3a and 3b are explained as such:
3a - The third step is to verify if there is an active scan. We are
waiting on a 0 on remhsyncsts (<XSPAREREG<0>.
3b - This step occurs only if the remove is actually scanning. We are
waiting for the end of the frame which is a 1 on remvsyncsts
(<XSPAREREG<1>).
The actual steps 3a and 3b are implemented as while loops with a
non-sleeping udelay(). The first step iterates while the tmp value at
position 0 is *not* set. That is, it keeps iterating as long as the bit is
zero. If the bit is already 0 (because there is no active scan), it will
iterate the entire 300 attempts which wastes 300 milliseconds in total.
This is opposite of what the description claims.
The step 3b logic only executes if we do not iterate over the entire 300
attempts in the first loop. If it does trigger, it is trying to check and
wait for a 1 on the remvsyncsts. However, again the condition is actually
inverted and it will loop as long as the bit is 1, stopping once it hits
zero (rather than the explained attempt to wait until we see a 1).
Worse, both loops are implemented using non-sleeping waits which spin
instead of allowing the scheduler to run other processes. If the kernel is
not configured to allow arbitrary preemption, it will waste valuable CPU
time doing nothing.
There does not appear to be any documentation for the BMC register
interface, beyond what is in the comments here. It seems more probable that
the comment here is correct and the implementation accidentally got
inverted from the intended logic.
Reading through other DRM driver implementations, it does not appear that
the .atomic_enable or .atomic_disable handlers need to delay instead of
sleep. For example, the ast_astdp_encoder_helper_atomic_disable() function
calls ast_dp_set_phy_sleep() which uses msleep(). The "atomic" in the name
is referring to the atomic modesetting support, which is the support to
enable atomic configuration from userspace, and not to the "atomic context"
of the kernel. There is no reason to use udelay() here if a sleep would be
sufficient.
Replace the while loops with a read_poll_timeout() based implementation
that will sleep between iterations, and which stops polling once the
condition is met (instead of looping as long as the condition is met). This
aligns with the commented behavior and avoids blocking on the CPU while
doing nothing.
Note the RREG_DAC is implemented using a statement expression to allow
working properly with the read_poll_timeout family of functions. The other
RREG_<TYPE> macros ought to be cleaned up to have better semantics, and
several places in the mgag200 driver could make use of RREG_DAC or similar
RREG_* macros should likely be cleaned up for better semantics as well, but
that task has been left as a future cleanup for a non-bugfix.
Fixes: 414c45310625 ("mgag200: initial g200se driver (v2)")
Suggested-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260202-jk-mgag200-fix-bad-udelay-v2-1-ce1e9665987d@intel.com
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux into soc/drivers
FSL SOC Changes for 6.20
Freescale Management Complex:
- Convert fsl-mc bus to bus callbacks
- Fix a use-after-free
- Drop redundant error messages
- Fix ressources release on some error path
Freescale QUICC Engine:
- Add an interrupt controller for IO Ports
- Use scoped for-each OF child loop
* tag 'soc_fsl-6.20-1' of https://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux:
bus: fsl-mc: fix an error handling in fsl_mc_device_add()
soc: fsl: qe: qe_ports_ic: Consolidate chained IRQ handler install/remove
dt-bindings: soc: fsl: qe: Add an interrupt controller for QUICC Engine Ports
soc: fsl: qe: Add an interrupt controller for QUICC Engine Ports
soc: fsl: qe: Simplify with scoped for each OF child loop
bus: fsl-mc: fix use-after-free in driver_override_show()
bus: fsl-mc: Convert to bus callbacks
bus: fsl-mc: Drop error message in probe function
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into soc/dt
SoCFPGA DTS updates for v6.20, version 3
- dt-bindings updates:
- Add intel,socfpga-agilex5-socdk-modular for the Agilex5 mod board
- Add intel,socfpga-agilex-emmc for the Agilex eMMC daughter board
- Move entries in intel,socfpga.yaml into altera.yaml
- Add syscon as a fallback for sys-mgr
- Add dma-cohrerent property for Agilex5 NAND and DMA
- Add support for the Agilex5 modular board
- Add IOMMUS property for ethernet nodes for Agilex5
- Use lowercase hex for dts files
- Add #address-cells and #size-cells for sram
- Fix dtbs_check warning for fpga-region
- Move dma controller node for Agilex5 under simple-bus
- Add support for the Agilex eMMC daughter board
* tag 'socfpga_dts_updates_for_v6.20_v3' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
dt-bindings: intel: Add Agilex eMMC support
arm64: dts: socfpga: agilex: add emmc support
arm64: dts: intel: agilex5: Add simple-bus node on top of dma controller node
ARM: dts: socfpga: fix dtbs_check warning for fpga-region
ARM: dts: socfpga: add #address-cells and #size-cells for sram node
dt-bindings: altera: document syscon as fallback for sys-mgr
arm64: dts: altera: Use lowercase hex
dt-bindings: arm: altera: combine Intel's SoCFPGA into altera.yaml
arm64: dts: socfpga: agilex5: Add IOMMUS property for ethernet nodes
arm64: dts: socfpga: agilex5: add support for modular board
dt-bindings: intel: Add Agilex5 SoCFPGA modular board
arm64: dts: socfpga: agilex5: Add dma-coherent property
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Jan Höppner says:
====================
Quite a lot of the tape device driver code is outdated as devices and
storage systems supported by that code aren't supported by IBM anymore for
a long time. Especially physical tape devices are not supported or used
directly anymore.
The only tape storage system supported by IBM is the Virtual Tape Server
(VTS) family with TS7700 systems [1]. Host systems will only talk to VTS
and are presented with the virtualized 3490E tape device type only. VTS can
and still uses tape libraries with physical 3592 cartridges as storage
backends (e.g. TS4500). However, these are never seen by any host.
The general goal/idea for the tape device driver is to only support VTS
from now on. This series gets rid of old outdated code that is not
relevant to VTS. There is probably quite a bit more that could be
cleaned up or could be improved. However, this is a first run to cleanup
the code base and somewhat reduce maintenance burden.
[1] https://www.ibm.com/products/ts7700
[2] https://www.ibm.com/products/ts4500
====================
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The driver now exclusively supports 3490 tape devices, given support for
3480 tape devices has been removed. Update the device driver name, its
source file name, and change any occurrences of "34xx/34XX" to "3490" in
the source code and comments.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Quite a few Error Recovery Action (ERA) codes and sense data entries are
not relevant anymore for the Virtual Tape Server (VTS) and are not being
used by VTS. Most of them were relevant for actual physical errors when
a tape cartridge got stuck or a tape didn't rewind properly for example.
Remove these codes from the sense data analysis as it's dead code
anyway.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The only supported device type by the Virtual Tape Server is 3490. The
3480 device type was an old physical tape model and doesn't exist
anymore. Remove 3480 from the list and any mention of it.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Quite a few command definitions are either not used or don't exist for
3490 tape devices on Virtual Tape Servers (VTS). Cleanup the list, which
makes it easier to understand what commands are actually implemented by
the driver. The lists below outline the exact reason for the removal.
Description for existing commands are adapted to reflect the removal of
support for old device types.
The following commands don't exist in VTS for 3490 devices and are
unused:
INVALID_00 0x00
DIAG_MODE_SET 0x0B
FORCE_STREAM_CNT 0xEB
LOOP_WRITE_TO_READ 0x8B
MODE_SET_C3 0xC3
MODE_SET_CB 0xCB
MODE_SET_D3 0xD3
NEW_MODE_SET 0xEB
RELEASE 0xD4
REQ_TRK_IN_ERROR 0x1B
RESERVE 0xF4
SET_DIAGNOSE 0x4B
The following command definitions are not used:
CONTROL_ACCESS 0xE3
PERF_SUBSYS_FUNC 0x77
READ_BACKWARD 0x0C
READ_BUFFER 0x12
READ_BUFF_LOG 0x24
READ_CONFIG_DATA 0xFA
READ_DEV_CHAR 0x64
READ_MESSAGE_ID 0x4E
READ_SUBSYS_DATA 0x3E
SENSE_GROUP_ID 0x34
SENSE_ID 0xE4
SET_GROUP_ID 0xAF
SET_INTERFACE_ID 0x73
SET_TAPE_WRITE_IMMED 0xC3
SUSPEND 0x5B
SYNC 0x43
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
For real 3490 tape models a logical block was composed of a direction
bit (wrap), a segment number, a format mode, and a logical block number.
This is represented in a 4byte identifier.
The Virtual Tape Server (VTS) emulates 3490 tape devices and uses a
stripped block id format where bit 0-9 of the 4byte identifier are
always 0. Bit 10-31 represent the logical block number.
All tapes use the 3480-2 XF format, which was defined via
TAPE34XX_FMT_3480_2_XF but never used. There is also no special handling
required anymore as this is the only format being used.
Since VTS doesn't require any special handling of block ids and format,
corresponding code is removed.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The LOAD_DISPLAY (LDD) X'9F' is still accepted by the Virtual Tape
Server (VTS) but does not perform any action.
Remove all functions and definitions related to this command.
The tape_34xx_ioctl() function is also removed as it was mainly used to
handle additional ioctl functionality. LOAD_DISPLAY was the only left
case. All other ioctls are handled in tapechar_ioctl().
With LOAD_DISPLAY, the remaining definitions in asm/tape390.h are gone.
Delete the file.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Physical 3590/3592 tape models are not supported anymore for a very long
time. The Virtual Tape Server (VTS) emulates and presents only 3490E
models to the host. This is the only supported model and storage server.
Remove the entire code base for 3590/3592 models as it can be considered
dead code for quite some time already.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Variable arrays should be defined as [], not [0], otherwise
the kernel complains:
memcpy : detected field-spanning write (size 9) of single field "param->set_program_ex.data" at drivers/platform/chrome/cros_ec_lightbar.c:603 (size 0)
Fixes: 9600b8bdbfe4 ("platform/chrome: lightbar: Add support for large sequence")
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Link: https://lore.kernel.org/r/20260204034848.697033-1-gwendal@google.com
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
|
|
Kevin Hao says:
====================
net: cpsw: Execute ndo_set_rx_mode callback in a work queue
These two patches resolve an RTNL assertion call trace issue
in both the legacy and new cpsw drivers.
====================
Link: https://patch.msgid.link/20260203-bbb-v5-0-ea0ea217a85c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 1767bb2d47b7 ("ipv6: mcast: Don't hold RTNL for
IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP.") removed the RTNL lock for
IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP operations. However, this
change triggered the following call trace on my BeagleBone Black board:
WARNING: net/8021q/vlan_core.c:236 at vlan_for_each+0x120/0x124, CPU#0: rpcbind/481
RTNL: assertion failed at net/8021q/vlan_core.c (236)
Modules linked in:
CPU: 0 UID: 997 PID: 481 Comm: rpcbind Not tainted 6.19.0-rc7-next-20260130-yocto-standard+ #35 PREEMPT
Hardware name: Generic AM33XX (Flattened Device Tree)
Call trace:
unwind_backtrace from show_stack+0x28/0x2c
show_stack from dump_stack_lvl+0x30/0x38
dump_stack_lvl from __warn+0xb8/0x11c
__warn from warn_slowpath_fmt+0x130/0x194
warn_slowpath_fmt from vlan_for_each+0x120/0x124
vlan_for_each from cpsw_add_mc_addr+0x54/0x98
cpsw_add_mc_addr from __hw_addr_ref_sync_dev+0xc4/0xec
__hw_addr_ref_sync_dev from __dev_mc_add+0x78/0x88
__dev_mc_add from igmp6_group_added+0x84/0xec
igmp6_group_added from __ipv6_dev_mc_inc+0x1fc/0x2f0
__ipv6_dev_mc_inc from __ipv6_sock_mc_join+0x124/0x1b4
__ipv6_sock_mc_join from do_ipv6_setsockopt+0x84c/0x1168
do_ipv6_setsockopt from ipv6_setsockopt+0x88/0xc8
ipv6_setsockopt from do_sock_setsockopt+0xe8/0x19c
do_sock_setsockopt from __sys_setsockopt+0x84/0xac
__sys_setsockopt from ret_fast_syscall+0x0/0x54
This trace occurs because vlan_for_each() is called within
cpsw_ndo_set_rx_mode(), which expects the RTNL lock to be held.
Since modifying vlan_for_each() to operate without the RTNL lock is not
straightforward, and because ndo_set_rx_mode() is invoked both with and
without the RTNL lock across different code paths, simply adding
rtnl_lock() in cpsw_ndo_set_rx_mode() is not a viable solution.
To resolve this issue, we opt to execute the actual processing within
a work queue, following the approach used by the icssg-prueth driver.
Please note: To reproduce this issue, I manually reverted the changes to
am335x-bone-common.dtsi from commit c477358e66a3 ("ARM: dts: am335x-bone:
switch to new cpsw switch drv") in order to revert to the legacy cpsw
driver.
Fixes: 1767bb2d47b7 ("ipv6: mcast: Don't hold RTNL for IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP.")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260203-bbb-v5-2-ea0ea217a85c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 1767bb2d47b7 ("ipv6: mcast: Don't hold RTNL for
IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP.") removed the RTNL lock for
IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP operations. However, this
change triggered the following call trace on my BeagleBone Black board:
WARNING: net/8021q/vlan_core.c:236 at vlan_for_each+0x120/0x124, CPU#0: rpcbind/496
RTNL: assertion failed at net/8021q/vlan_core.c (236)
Modules linked in:
CPU: 0 UID: 997 PID: 496 Comm: rpcbind Not tainted 6.19.0-rc6-next-20260122-yocto-standard+ #8 PREEMPT
Hardware name: Generic AM33XX (Flattened Device Tree)
Call trace:
unwind_backtrace from show_stack+0x28/0x2c
show_stack from dump_stack_lvl+0x30/0x38
dump_stack_lvl from __warn+0xb8/0x11c
__warn from warn_slowpath_fmt+0x130/0x194
warn_slowpath_fmt from vlan_for_each+0x120/0x124
vlan_for_each from cpsw_add_mc_addr+0x54/0xd8
cpsw_add_mc_addr from __hw_addr_ref_sync_dev+0xc4/0xec
__hw_addr_ref_sync_dev from __dev_mc_add+0x78/0x88
__dev_mc_add from igmp6_group_added+0x84/0xec
igmp6_group_added from __ipv6_dev_mc_inc+0x1fc/0x2f0
__ipv6_dev_mc_inc from __ipv6_sock_mc_join+0x124/0x1b4
__ipv6_sock_mc_join from do_ipv6_setsockopt+0x84c/0x1168
do_ipv6_setsockopt from ipv6_setsockopt+0x88/0xc8
ipv6_setsockopt from do_sock_setsockopt+0xe8/0x19c
do_sock_setsockopt from __sys_setsockopt+0x84/0xac
__sys_setsockopt from ret_fast_syscall+0x0/0x5
This trace occurs because vlan_for_each() is called within
cpsw_ndo_set_rx_mode(), which expects the RTNL lock to be held.
Since modifying vlan_for_each() to operate without the RTNL lock is not
straightforward, and because ndo_set_rx_mode() is invoked both with and
without the RTNL lock across different code paths, simply adding
rtnl_lock() in cpsw_ndo_set_rx_mode() is not a viable solution.
To resolve this issue, we opt to execute the actual processing within
a work queue, following the approach used by the icssg-prueth driver.
Fixes: 1767bb2d47b7 ("ipv6: mcast: Don't hold RTNL for IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP.")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260203-bbb-v5-1-ea0ea217a85c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Several registers referenced in this driver's source code do not
actually exist (they are not writable and read as zero in my testing).
They exist in this driver because it originated as a copy of the dm9601
driver. Notably, these include the multicast filter registers - this
causes the driver to not support multicast packets correctly. Remove
the multicast filter code and register definitions. Instead, set the
chip to receive all multicast filter packets when any multicast
addresses are in the list.
Reviewed-by: Simon Horman <horms@kernel.org> (from v1)
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260203013924.28582-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Many USB network drivers use identical code to pass ioctl
requests on to the MII layer. Reduce code duplication by
refactoring this code into a helper function.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> (v1)
Reviewed-by: Andrew Lunn <andrew@lunn.ch> (v3)
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260203013517.26170-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Niklas Söderlund says:
====================
net: ethernet: renesas: rcar_gen4_ptp: Hide private data
The R-Car Gen4 PTP module started out as an exclusive feature of a
single driver, but have since been extended to cover both R-Car Switch
and TSN driver implementations on Gen4.
The feature have already been extended to be built as its own module
with an interface exposed thru a local header file. The header file
however also exposes the modules private data structure. The two
existing users have already started to poke at members of the struct.
The exposed private data being manipulated by users makes refactoring
and future rework hard as the interface for the module becomes to
chaotic. This small series aims to create two helpers to hide the
private data.
This is done as a small preparation before a third, new, users of the
Gen4 PTP will be added in a follow up series.
====================
Link: https://patch.msgid.link/20260201183745.1075399-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Gen4 PTP helper module is already used by RTSN and RSWITCH to
support PTP clocks and will be used by RAVB too. Hide the Gen4 PTP
private data structure to make sure none of the users poke at it.
This will be more important for RAVB use-cases as more then one RAVB
device will need to cooperate using one PTP clock source.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260201183745.1075399-5-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|