linux-toradex.git/drivers/md/md.c, branch v6.19

md: Fix forward incompatibility from configurable logical block size

2025-12-27T02:14:07+00:00

Commit 62ed1b582246 ("md: allow configuring logical block size") used
reserved pad to add 'logical_block_size' to metadata. RAID rejects
non-zero reserved pad, so arrays fail when rolling back to old kernels
after booting new ones.

Set 'logical_block_size' only for newly created arrays to support rollback
to old kernels. Importantly new arrays still won't work on old kernels to
prevent data loss issue from LBS changes.

For arrays created on old kernels which confirmed not to rollback,
configure LBS by echo current LBS (queue/logical_block_size) to
md/logical_block_size.

Fixes: 62ed1b582246 ("md: allow configuring logical block size")
Reported-by: BugReports 
Closes: https://lore.kernel.org/linux-raid/825e532d-d1e1-44bb-5581-692b7c091796@huaweicloud.com/T/#t
Signed-off-by: Li Nan 
Link: https://lore.kernel.org/linux-raid/20251226024221.724201-2-linan666@huaweicloud.com
Signed-off-by: Yu Kuai

md: Fix logical_block_size configuration being overwritten

2025-12-27T02:13:40+00:00

In super_1_validate(), mddev->logical_block_size is directly overwritten
with the value from metadata. This causes the previously configured lbs
to be lost, making the configuration ineffective. Fix it.

Fixes: 62ed1b582246 ("md: allow configuring logical block size")
Signed-off-by: Li Nan 
Reviewed-by: Yu Kuai 
Reviewed-by: Xiao Ni 
Link: https://lore.kernel.org/linux-raid/20251226024221.724201-1-linan666@huaweicloud.com
Signed-off-by: Yu Kuai

md: suspend array while updating raid_disks via sysfs

2025-12-27T01:54:50+00:00

In raid1_reshape(), freeze_array() is called before modifying the r1bio
memory pool (conf->r1bio_pool) and conf->raid_disks, and
unfreeze_array() is called after the update is completed.

However, freeze_array() only waits until nr_sync_pending and
(nr_pending - nr_queued) of all buckets reaches zero. When an I/O error
occurs, nr_queued is increased and the corresponding r1bio is queued to
either retry_list or bio_end_io_list. As a result, freeze_array() may
unblock before these r1bios are released.

This can lead to a situation where conf->raid_disks and the mempool have
already been updated while queued r1bios, allocated with the old
raid_disks value, are later released. Consequently, free_r1bio() may
access memory out of bounds in put_all_bios() and release r1bios of the
wrong size to the new mempool, potentially causing issues with the
mempool as well.

Since only normal I/O might increase nr_queued while an I/O error occurs,
suspending the array avoids this issue.

Note: Updating raid_disks via ioctl SET_ARRAY_INFO already suspends
the array. Therefore, we suspend the array when updating raid_disks
via sysfs to avoid this issue too.

Signed-off-by: FengWei Shih 
Link: https://lore.kernel.org/linux-raid/20251226101816.4506-1-dannyshih@synology.com
Signed-off-by: Yu Kuai

md: Fix static checker warning in analyze_sbs

2025-12-25T07:45:21+00:00

The following warn is reported:

 drivers/md/md.c:3912 analyze_sbs()
 warn: iterator 'i' not incremented

Fixes: d8730f0cf4ef ("md: Remove deprecated CONFIG_MD_MULTIPATH")
Reported-by: Dan Carpenter 
Closes: https://lore.kernel.org/linux-raid/7e2e95ce-3740-09d8-a561-af6bfb767f18@huaweicloud.com/T/#t
Signed-off-by: Li Nan 
Link: https://lore.kernel.org/linux-raid/20251215124412.4015572-1-linan666@huaweicloud.com
Signed-off-by: Yu Kuai

md: remove legacy 1s delay in md_notify_reboot

2025-11-30T01:42:28+00:00

During system shutdown, the md driver registered notifier function
(md_notify_reboot) currently imposes a hardcoded one-second delay.

This delay was introduced approximately 23 years ago and was likely
necessary for the hardware generation of that time. Proposing this
patch to make sure there are no known devices that need this delay.

Link: https://lore.kernel.org/linux-raid/20251121191422.2758555-1-tarunsahu@google.com
Signed-off-by: Tarun Sahu 
Reviewed-by: Yu Kuai
Signed-off-by: Yu Kuai

md: warn about updating super block failure

2025-11-30T01:38:22+00:00

Many personalities will handle IO error from daemon thread(like raid1d,
raid10d, raid5d), and sb will require to be clean before hanlding these
failed IO. However update sb can fail, for example array is broken by
IO failure, or user config sysfs api array_state.

This patch adds warning if updating sb failed first, in case this will
be related to IO hang.

Link: https://lore.kernel.org/linux-raid/20251117085557.770572-2-yukuai@fnnas.com
Signed-off-by: Yu Kuai 
Reviewed-by: Li Nan

md: allow configuring logical block size

2025-11-11T03:20:15+00:00

Previously, raid array used the maximum logical block size (LBS)
of all member disks. Adding a larger LBS disk at runtime could
unexpectedly increase RAID's LBS, risking corruption of existing
partitions. This can be reproduced by:

```
# LBS of sd[de] is 512 bytes, sdf is 4096 bytes.
mdadm -CRq /dev/md0 -l1 -n3 /dev/sd[de] missing --assume-clean

# LBS is 512
cat /sys/block/md0/queue/logical_block_size

# create partition md0p1
parted -s /dev/md0 mklabel gpt mkpart primary 1MiB 100%
lsblk | grep md0p1

# LBS becomes 4096 after adding sdf
mdadm --add -q /dev/md0 /dev/sdf
cat /sys/block/md0/queue/logical_block_size

# partition lost
partprobe /dev/md0
lsblk | grep md0p1
```

Simply restricting larger-LBS disks is inflexible. In some scenarios,
only disks with 512 bytes LBS are available currently, but later, disks
with 4KB LBS may be added to the array.

Making LBS configurable is the best way to solve this scenario.
After this patch, the raid will:
- store LBS in disk metadata
- add a read-write sysfs 'mdX/logical_block_size'

Future mdadm should support setting LBS via metadata field during RAID
creation and the new sysfs. Though the kernel allows runtime LBS changes,
users should avoid modifying it after creating partitions or filesystems
to prevent compatibility issues.

Only 1.x metadata supports configurable LBS. 0.90 metadata inits all
fields to default values at auto-detect. Supporting 0.90 would require
more extensive changes and no such use case has been observed.

Note that many RAID paths rely on PAGE_SIZE alignment, including for
metadata I/O. A larger LBS than PAGE_SIZE will result in metadata
read/write failures. So this config should be prevented.

Link: https://lore.kernel.org/linux-raid/20251103125757.1405796-6-linan666@huaweicloud.com
Signed-off-by: Li Nan
Reviewed-by: Xiao Ni
Signed-off-by: Yu Kuai

md: add check_new_feature module parameter

2025-11-11T03:19:54+00:00

Raid checks if pad3 is zero when loading superblock from disk. Arrays
created with new features may fail to assemble on old kernels as pad3
is used.

Add module parameter check_new_feature to bypass this check.

Link: https://lore.kernel.org/linux-raid/20251103125757.1405796-5-linan666@huaweicloud.com
Signed-off-by: Li Nan 
Reviewed-by: Xiao Ni 
Signed-off-by: Yu Kuai

md: init bioset in mddev_init

2025-11-11T03:19:10+00:00

IO operations may be needed before md_run(), such as updating metadata
after writing sysfs. Without bioset, this triggers a NULL pointer
dereference as below:

 BUG: kernel NULL pointer dereference, address: 0000000000000020
 Call Trace:
  md_update_sb+0x658/0xe00
  new_level_store+0xc5/0x120
  md_attr_store+0xc9/0x1e0
  sysfs_kf_write+0x6f/0xa0
  kernfs_fop_write_iter+0x141/0x2a0
  vfs_write+0x1fc/0x5a0
  ksys_write+0x79/0x180
  __x64_sys_write+0x1d/0x30
  x64_sys_call+0x2818/0x2880
  do_syscall_64+0xa9/0x580
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

Reproducer
```
  mdadm -CR /dev/md0 -l1 -n2 /dev/sd[cd]
  echo inactive > /sys/block/md0/md/array_state
  echo 10 > /sys/block/md0/md/new_level
```

mddev_init() can only be called once per mddev, no need to test if bioset
has been initialized anymore.

Link: https://lore.kernel.org/linux-raid/20251103125757.1405796-3-linan666@huaweicloud.com
Fixes: d981ed841930 ("md: Add new_level sysfs interface")
Signed-off-by: Li Nan 
Reviewed-by: Xiao Ni 
Signed-off-by: Yu Kuai

md: delete md_redundancy_group when array is becoming inactive

2025-11-11T03:18:51+00:00

'md_redundancy_group' are created in md_run() and deleted in del_gendisk(),
but these are not paired. Writing inactive/active to sysfs array_state can
trigger md_run() multiple times without del_gendisk(), leading to
duplicate creation as below:

 sysfs: cannot create duplicate filename '/devices/virtual/block/md0/md/sync_action'
 Call Trace:
  dump_stack_lvl+0x9f/0x120
  dump_stack+0x14/0x20
  sysfs_warn_dup+0x96/0xc0
  sysfs_add_file_mode_ns+0x19c/0x1b0
  internal_create_group+0x213/0x830
  sysfs_create_group+0x17/0x20
  md_run+0x856/0xe60
  ? __x64_sys_openat+0x23/0x30
  do_md_run+0x26/0x1d0
  array_state_store+0x559/0x760
  md_attr_store+0xc9/0x1e0
  sysfs_kf_write+0x6f/0xa0
  kernfs_fop_write_iter+0x141/0x2a0
  vfs_write+0x1fc/0x5a0
  ksys_write+0x79/0x180
  __x64_sys_write+0x1d/0x30
  x64_sys_call+0x2818/0x2880
  do_syscall_64+0xa9/0x580
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 md: cannot register extra attributes for md0

Creation of it depends on 'pers', its lifecycle cannot be aligned with
gendisk. So fix this issue by triggering 'md_redundancy_group' deletion
when the array is becoming inactive.

Link: https://lore.kernel.org/linux-raid/20251103125757.1405796-2-linan666@huaweicloud.com
Fixes: 790abe4d77af ("md: remove/add redundancy group only in level change")
Signed-off-by: Li Nan 
Reviewed-by: Xiao Ni 
Signed-off-by: Yu Kuai