summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2026-04-08ecryptfs: factor out a ecryptfs_iattr_to_lower helperChristoph Hellwig
Prepare for using the code to create a lower struct iattr in multiple places. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tyler Hicks <code@tyhicks.com>
2026-04-08ecryptfs: merge ecryptfs_inode_newsize_ok into truncate_upperChristoph Hellwig
Both callers of ecryptfs_inode_newsize_ok call truncate_upper right after. Merge ecryptfs_inode_newsize_ok into truncate_upper to simplify the logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tyler Hicks <code@tyhicks.com>
2026-04-08ecryptfs: combine the two ATTR_SIZE blocks in ecryptfs_setattrChristoph Hellwig
Simplify the logic in ecryptfs_setattr by combining the two ATTR_SIZE blocks. This initializes lower_ia before the size check, which is obviously correct as the size check doesn't look at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tyler Hicks <code@tyhicks.com>
2026-04-08ecryptfs: use ZERO_PAGE instead of allocating zeroed memory in truncate_upperChristoph Hellwig
Use the existing pre-zeroed memory instead of allocating a new chunk. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tyler Hicks <code@tyhicks.com>
2026-04-08ecryptfs: streamline truncate_upperChristoph Hellwig
Use a few strategic gotos to reduce indentation and keep the main flow outside of branches. Switch all touched comments to normal kernel style and avoid breaks in printed strings for all the code touched. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tyler Hicks <code@tyhicks.com>
2026-04-08ecryptfs: cleanup ecryptfs_setattrChristoph Hellwig
Initialize variables at declaration time where applicable and reformat conditionals to match the kernel coding style. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tyler Hicks <code@tyhicks.com>
2026-04-08hfsplus: fix generic/642 failureViacheslav Dubeyko
The xfstests' test-case generic/642 finishes with corrupted HFS+ volume: sudo ./check generic/642 [sudo] password for slavad: FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 7.0.0-rc1+ #26 SMP PREEMPT_DYNAMIC Mon Mar 23 17:24:32 PDT 2026 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/642 6s ... _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent (see xfstests-dev/results//generic/642.full for details) Ran: generic/642 Failures: generic/642 Failed 1 of 1 tests sudo fsck.hfs -d /dev/loop51 ** /dev/loop51 Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K. Executing fsck_hfs (version 540.1-Linux). ** Checking non-journaled HFS Plus Volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. ** Checking multi-linked files. ** Checking catalog hierarchy. ** Checking extended attributes file. invalid free nodes - calculated 1637 header 1260 Invalid B-tree header Invalid map node (8, 0) ** Checking volume bitmap. ** Checking volume information. Verify Status: VIStat = 0x0000, ABTStat = 0xc000 EBTStat = 0x0000 CBTStat = 0x0000 CatStat = 0x00000000 ** Repairing volume. ** Rechecking volume. ** Checking non-journaled HFS Plus Volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. ** Checking multi-linked files. ** Checking catalog hierarchy. ** Checking extended attributes file. ** Checking volume bitmap. ** Checking volume information. ** The volume untitled was repaired successfully. The fsck tool detected that Extended Attributes b-tree is corrupted. Namely, the free nodes number is incorrect and map node bitmap has inconsistent state. Analysis has shown that during b-tree closing there are still some lost b-tree's nodes in the hash out of b-tree structure. But this orphaned b-tree nodes are still accounted as used in map node bitmap: tree_cnid 8, nidx 0, node_count 1408, free_nodes 1403 tree_cnid 8, nidx 1, node_count 1408, free_nodes 1403 tree_cnid 8, nidx 3, node_count 1408, free_nodes 1403 tree_cnid 8, nidx 54, node_count 1408, free_nodes 1403 tree_cnid 8, nidx 67, node_count 1408, free_nodes 1403 tree_cnid 8, nidx 0, prev 0, next 0, parent 0, num_recs 3, type 0x1, height 0 tree_cnid 8, nidx 1, prev 0, next 0, parent 3, num_recs 1, type 0xff, height 1 tree_cnid 8, nidx 3, prev 0, next 0, parent 0, num_recs 1, type 0x0, height 2 tree_cnid 8, nidx 54, prev 29, next 46, parent 3, num_recs 0, type 0xff, height 1 tree_cnid 8, nidx 67, prev 8, next 14, parent 3, num_recs 0, type 0xff, height 1 This issue happens in hfs_bnode_split() logic during detection the possibility of moving half ot the records out of the node. The hfs_bnode_split() contains a loop that implements a roughly 50/50 split of the B-tree node's records by scanning the offset table to find where the data crosses the node's midpoint. If this logic detects the incapability of spliting the node, then it simply calls hfs_bnode_put() for newly created node. However, node is not set as HFS_BNODE_DELETED and real deletion of node doesn't happen. As a result, the empty node becomes orphaned but it is still accounted as used. Finally, fsck tool detects this inconsistency of HFS+ volume. This patch adds call of hfs_bnode_unlink() before hfs_bnode_put() for the case if new node cannot be used for spliting the existing node. sudo ./check generic/642 FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 7.0.0-rc1+ #26 SMP PREEMPT_DYNAMIC Fri Apr 3 12:39:13 PDT 2026 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/642 40s ... 39s Ran: generic/642 Passed all 1 tests Closes: https://github.com/hfs-linux-kernel/hfs-linux-kernel/issues/242 cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20260403230556.614171-6-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2026-04-08hfsplus: rework logic of map nodes creation in xattr b-treeViacheslav Dubeyko
In hfsplus_init_header_node() when node_count > 63488 (header bitmap capacity), the code calculates map_nodes, subtracts them from free_nodes, and marks their positions used in the bitmap. However, it doesn't write the actual map node structure (type, record offsets, bitmap) for those physical positions, only node 0 is written. This patch reworks hfsplus_create_attributes_file() logic by introducing a specialized method of hfsplus_init_map_node() and writing the allocated map b-tree's nodes by means of hfsplus_write_attributes_file_node() method. cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20260403230556.614171-5-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2026-04-08hfsplus: fix logic of alloc/free b-tree nodeViacheslav Dubeyko
The hfs_bmap_alloc() and hfs_bmap_free() modify the b-tree's counters and nodes' bitmap of b-tree. However, hfs_btree_write() synchronizes the state of in-core b-tree's counters and node's bitmap with b-tree's descriptor in header node. Postponing this synchronization could result in inconsistent state of file system volume. This patch adds calling of hfs_btree_write() in hfs_bmap_alloc() and hfs_bmap_free() methods. cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20260403230556.614171-4-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2026-04-08hfsplus: fix error processing issue in hfs_bmap_free()Viacheslav Dubeyko
Currently, we check only -EINVAL error code in hfs_bmap_free() after calling the hfs_bmap_clear_bit(). It means that other error codes will be silently ignored. This patch adds the checking of all other error codes. cc: Shardul Bankar <shardul.b@mpiricsoftware.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20260403230556.614171-3-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2026-04-08hfsplus: fix potential race conditions in b-tree functionalityViacheslav Dubeyko
The HFS_BNODE_DELETED flag is checked in hfs_bnode_put() under locked tree->hash_lock. This patch adds locking for the case of setting the HFS_BNODE_DELETED flag in hfs_bnode_unlink() with the goal to avoid potential race conditions. The hfs_btree_write() method should be called under tree->tree_lock. This patch reworks logic by adding locking the tree->tree_lock for the calls of hfs_btree_write() in hfsplus_cat_write_inode() and hfsplus_system_write_inode(). This patch adds also the lockdep_assert_held() in hfs_bmap_reserve(), hfs_bmap_alloc(), and hfs_bmap_free(). cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20260403230556.614171-2-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2026-04-08tracefs: Fix default permissions not being applied on initial mountDavid Carlier
Commit e4d32142d1de ("tracing: Fix tracefs mount options") moved the option application from tracefs_fill_super() to tracefs_reconfigure() called from tracefs_get_tree(). This fixed mount options being ignored on user-space mounts when the superblock already exists, but introduced a regression for the initial kernel-internal mount. On the first mount (via simple_pin_fs during init), sget_fc() transfers fc->s_fs_info to sb->s_fs_info and sets fc->s_fs_info to NULL. When tracefs_get_tree() then calls tracefs_reconfigure(), it sees a NULL fc->s_fs_info and returns early without applying any options. The root inode keeps mode 0755 from simple_fill_super() instead of the intended TRACEFS_DEFAULT_MODE (0700). Furthermore, even on subsequent user-space mounts without an explicit mode= option, tracefs_apply_options(sb, true) gates the mode behind fsi->opts & BIT(Opt_mode), which is unset for the defaults. So the mode is never corrected unless the user explicitly passes mode=0700. Restore the tracefs_apply_options(sb, false) call in tracefs_fill_super() to apply default permissions on initial superblock creation, matching what debugfs does in debugfs_fill_super(). Cc: stable@vger.kernel.org Fixes: e4d32142d1de ("tracing: Fix tracefs mount options") Link: https://patch.msgid.link/20260404134747.98867-1-devnexen@gmail.com Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-04-08mm: add gpu active/reclaim per-node stat counters (v2)Dave Airlie
While discussing memcg intergration with gpu memory allocations, it was pointed out that there was no numa/system counters for GPU memory allocations. With more integrated memory GPU server systems turning up, and more requirements for memory tracking it seems we should start closing the gap. Add two counters to track GPU per-node system memory allocations. The first is currently allocated to GPU objects, and the second is for memory that is stored in GPU page pools that can be reclaimed, by the shrinker. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2026-04-07smb: client: fix OOB reads parsing symlink error responseGreg Kroah-Hartman
When a CREATE returns STATUS_STOPPED_ON_SYMLINK, smb2_check_message() returns success without any length validation, leaving the symlink parsers as the only defense against an untrusted server. symlink_data() walks SMB 3.1.1 error contexts with the loop test "p < end", but reads p->ErrorId at offset 4 and p->ErrorDataLength at offset 0. When the server-controlled ErrorDataLength advances p to within 1-7 bytes of end, the next iteration will read past it. When the matching context is found, sym->SymLinkErrorTag is read at offset 4 from p->ErrorContextData with no check that the symlink header itself fits. smb2_parse_symlink_response() then bounds-checks the substitute name using SMB2_SYMLINK_STRUCT_SIZE as the offset of PathBuffer from iov_base. That value is computed as sizeof(smb2_err_rsp) + sizeof(smb2_symlink_err_rsp), which is correct only when ErrorContextCount == 0. With at least one error context the symlink data sits 8 bytes deeper, and each skipped non-matching context shifts it further by 8 + ALIGN(ErrorDataLength, 8). The check is too short, allowing the substitute name read to run past iov_len. The out-of-bound heap bytes are UTF-16-decoded into the symlink target and returned to userspace via readlink(2). Fix this all up by making the loops test require the full context header to fit, rejecting sym if its header runs past end, and bound the substitute name against the actual position of sym->PathBuffer rather than a fixed offset. Because sub_offs and sub_len are 16bits, the pointer math will not overflow here with the new greater-than. Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Shyam Prasad N <sprasad@microsoft.com> Cc: Tom Talpey <tom@talpey.com> Cc: Bharath SM <bharathsm@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: stable <stable@kernel.org> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Assisted-by: gregkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-04-07smb: client: fix off-by-8 bounds check in check_wsl_eas()Greg Kroah-Hartman
The bounds check uses (u8 *)ea + nlen + 1 + vlen as the end of the EA name and value, but ea_data sits at offset sizeof(struct smb2_file_full_ea_info) = 8 from ea, not at offset 0. The strncmp() later reads ea->ea_data[0..nlen-1] and the value bytes follow at ea_data[nlen+1..nlen+vlen], so the actual end is ea->ea_data + nlen + 1 + vlen. Isn't pointer math fun? The earlier check (u8 *)ea > end - sizeof(*ea) only guarantees the 8-byte header is in bounds, but since the last EA is placed within 8 bytes of the end of the response, the name and value bytes are read past the end of iov. Fix this mess all up by using ea->ea_data as the base for the bounds check. An "untrusted" server can use this to leak up to 8 bytes of kernel heap into the EA name comparison and influence which WSL xattr the data is interpreted as. Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Shyam Prasad N <sprasad@microsoft.com> Cc: Tom Talpey <tom@talpey.com> Cc: Bharath SM <bharathsm@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: stable <stable@kernel.org> Assisted-by: gregkh_clanker_t1000 Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-04-07gfs2: prevent NULL pointer dereference during unmountAndreas Gruenbacher
When flushing out outstanding glock work during an unmount, gfs2_log_flush() can be called when sdp->sd_jdesc has already been deallocated and sdp->sd_jdesc is NULL. Commit 35264909e9d1 ("gfs2: Fix NULL pointer dereference in gfs2_log_flush") added a check for that to gfs2_log_flush() itself, but it missed the sdp->sd_jdesc dereference in gfs2_log_release(). Fix that. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202604071139.HNJiCaAi-lkp@intel.com/ Fixes: 35264909e9d1 ("gfs2: Fix NULL pointer dereference in gfs2_log_flush") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07gfs2: hide error messages after withdrawAndreas Gruenbacher
In gfs2_evict_inode(), don't issue error messages once a withdraw has already occurred. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07gfs2: wait for withdraw earlier during unmountAndreas Gruenbacher
During an unmount, wait for potential withdraw to complete before calling gfs2_make_fs_ro(). This will allow gfs2_make_fs_ro() to skip much of its work. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07gfs2: inode directory consistency checksAndreas Gruenbacher
In gfs2_dinode_in(), only allow directories to have the GFS2_DIF_EXHASH flag set. This will prevent other parts of the code from treating regular inodes as directories based on the presence of that flag. In sweep_bh_for_rgrps() and __gfs2_free_blocks(), check if the GFS2_DIF_EXHASH flag is set instead of checking if i_depth is non-zero. This matches what the directory code does. (The i_depth checks were introduced in commit 6d3117b412951 ("GFS2: Wipe directory hash table metadata when deallocating a directory").) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07gfs2: gfs2_log_flush withdraw fixesAndreas Gruenbacher
When a withdraw occurs in gfs2_log_flush() and we are left with an unsubmitted bio, fail that bio. Otherwise, the bh's in that bio will remain locked and gfs2_evict_inode() -> truncate_inode_pages() -> gfs2_invalidate_folio() -> gfs2_discard() will hang trying to discard the locked bh's. In addition, when gfs2_log_flush() fails to submit a new transaction, unpin the buffers in the failing transaction like gfs2_remove_from_journal() does. If any of the bd's are on the ail2 list, leave them there and do_withdraw() -> gfs2_withdraw_glocks() -> inode_go_inval() -> truncate_inode_pages() -> gfs2_invalidate_folio() -> gfs2_discard() will remove them. They will be freed in gfs2_release_folio(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07gfs2: add some missing log lockingAndreas Gruenbacher
Function gfs2_logd() calls the log flushing functions gfs2_ail1_start(), gfs2_ail1_wait(), and gfs2_ail1_empty() without holding sdp->sd_log_flush_lock, but these functions require exclusion against concurrent transactions. To fix that, add a non-locking __gfs2_log_flush() function. Then, in gfs2_logd(), take sdp->sd_log_flush_lock before calling the above mentioned log flushing functions and __gfs2_log_flush(). Fixes: 5e4c7632aae1c ("gfs2: Issue revokes more intelligently") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07gfs2: fix address space truncation during withdrawAndreas Gruenbacher
When a withdrawn filesystem's inodes are being evicted, the address spaces of those inodes still need to be truncated but we can no longer start new transactions. We still don't want gfs2_invalidate_folio() to race with gfs2_log_flush(), so take a read lock on sdp->sd_log_flush_lock in that case. (It may not be obvious, but gfs2_invalidate_folio() is a jdata-only address space operation.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07gfs2: drain ail under sd_log_flush_lockAndreas Gruenbacher
When a withdraw is carried out, call gfs2_ail_drain() under the sdp->sd_log_flush_lock. This isn't strictly necessary but should be easier to read, and more robust against possible future bugs. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07fs/resctrl: Add missing return value descriptionsReinette Chatre
Using the stricter "./tools/docs/kernel-doc -Wall -v" to verify proper formatting of documentation comments includes warnings related to return markup on functions that are omitted during the default verification checks. This stricter verification reports a couple of missing return descriptions in resctrl: Warning: .../fs/resctrl/rdtgroup.c:1536 No description found for return value of 'rdtgroup_cbm_to_size' Warning: .../fs/resctrl/rdtgroup.c:3131 No description found for return value of 'mon_get_kn_priv' Warning: .../fs/resctrl/rdtgroup.c:3523 No description found for return value of 'cbm_ensure_valid' Warning: .../fs/resctrl/monitor.c:238 No description found for return value of 'resctrl_find_cleanest_closid' Add the missing return descriptions. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/1c50b9f7c73251c007133590986f127e1af57780.1775576382.git.reinette.chatre@intel.com
2026-04-07btrfs: btrfs_log_dev_io_error() on all bio errorsBoris Burkov
As far as I can tell, we never intentionally constrained ourselves to these status codes, and it is misleading and surprising to lack the bdev error logging when we get a different error code from the block layer. This can lead to jumping to a wrong conclusion like "this system didn't see any bio failures but aborted with EIO". For example on nvme devices, I observe many failures coming back as BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of BLK_STS_* status values in nvme_error_status(). So handle the known expected errors and make some noise on the rest which we expect won't really happen. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: fix silent IO error loss in encoded writes and zoned splitMichal Grzedzicki
can_finish_ordered_extent() and btrfs_finish_ordered_zoned() set BTRFS_ORDERED_IOERR via bare set_bit(). Later, btrfs_mark_ordered_extent_error() in btrfs_finish_one_ordered() uses test_and_set_bit(), finds it already set, and skips mapping_set_error(). The error is never recorded on the inode's address_space, making it invisible to fsync. For encoded writes this causes btrfs receive to silently produce files with zero-filled holes. Fix: replace bare set_bit(BTRFS_ORDERED_IOERR) with btrfs_mark_ordered_extent_error() which pairs test_and_set_bit() with mapping_set_error(), guaranteeing the error is recorded exactly once. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: Michal Grzedzicki <mge@meta.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: skip clearing EXTENT_DEFRAG for NOCOW ordered extentsDave Chen
In btrfs_finish_one_ordered(), clear_bits is unconditionally initialized with EXTENT_DEFRAG. For NOCOW ordered extents this is always a no-op because should_nocow() already forces the COW path when EXTENT_DEFRAG is set, so a NOCOW ordered extent can never have EXTENT_DEFRAG on its range. Although harmless, the unconditional btrfs_clear_extent_bit() call still performs a cold rbtree lookup under the io tree spinlock on every NOCOW write completion. Avoid this by only adding EXTENT_DEFRAG to clear_bits for non-NOCOW ordered extents, and skip the call entirely when there are no bits to clear. Signed-off-by: Dave Chen <davechen@synology.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: use BTRFS_FS_UPDATE_UUID_TREE_GEN flag for UUID tree rescan checkDave Chen
The UUID tree rescan check in open_ctree() compares fs_info->generation with the superblock's uuid_tree_generation. This comparison is not reliable because fs_info->generation is bumped at transaction start time in join_transaction(), while uuid_tree_generation is only updated at commit time via update_super_roots(). Between the early BTRFS_FS_UPDATE_UUID_TREE_GEN flag check and the late rescan decision, mount operations such as file orphan cleanup from an unclean shutdown start transactions without committing them. This advances fs_info->generation past uuid_tree_generation and produces a false-positive mismatch. Use the BTRFS_FS_UPDATE_UUID_TREE_GEN flag directly instead. The flag was already set earlier in open_ctree() when the generations were known to match, and accurately represents "UUID tree is up to date" without being affected by subsequent transaction starts. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Dave Chen <davechen@synology.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: remove duplicate journal_info reset on failure to commit transactionFilipe Manana
If we get an error during the transaction commit path, we are resetting current->journal_info to NULL twice - once in btrfs_commit_transaction() right before calling cleanup_transaction() and then once again inside cleanup_transaction(). Remove the instance in btrfs_commit_transaction(). Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: tag as unlikely if statements that check for fs in error stateFilipe Manana
Having the filesystem in an error state, meaning we had a transaction abort, is unexpected. Mark every check for the error state with the unlikely annotation to convey that and to allow the compiler to generate better code. On x86_64, using gcc 14.2.0-19 from Debian, resulted in a slightly reduced object size and better code. Before: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 2008598 175912 15592 2200102 219226 fs/btrfs/btrfs.ko After: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 2008450 175912 15592 2199954 219192 fs/btrfs/btrfs.ko Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07Merge tag 'mm-hotfixes-stable-2026-04-06-15-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Eight hotfixes. All are cc:stable and seven are for MM. All are singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-04-06-15-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: ocfs2: fix out-of-bounds write in ocfs2_write_end_inline mm/damon/stat: deallocate damon_call() failure leaking damon_ctx mm/vma: fix memory leak in __mmap_region() mm/memory_hotplug: maintain N_NORMAL_MEMORY during hotplug mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails mm: reinstate unconditional writeback start in balance_dirty_pages() liveupdate: propagate file deserialization failures mm: filemap: fix nr_pages calculation overflow in filemap_map_pages()
2026-04-07btrfs: fix double free in create_space_info() error pathGuangshuo Li
When kobject_init_and_add() fails, the call chain is: create_space_info() -> btrfs_sysfs_add_space_info_type() -> kobject_init_and_add() -> failure -> kobject_put(&space_info->kobj) -> space_info_release() -> kfree(space_info) Then control returns to create_space_info(): btrfs_sysfs_add_space_info_type() returns error -> goto out_free -> kfree(space_info) This causes a double free. Keep the direct kfree(space_info) for the earlier failure path, but after btrfs_sysfs_add_space_info_type() has called kobject_put(), let the kobject release callback handle the cleanup. Fixes: a11224a016d6d ("btrfs: fix memory leaks in create_space_info() error paths") CC: stable@vger.kernel.org # 6.19+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: fix double free in create_space_info_sub_group() error pathGuangshuo Li
When kobject_init_and_add() fails, the call chain is: create_space_info_sub_group() -> btrfs_sysfs_add_space_info_type() -> kobject_init_and_add() -> failure -> kobject_put(&sub_group->kobj) -> space_info_release() -> kfree(sub_group) Then control returns to create_space_info_sub_group(), where: btrfs_sysfs_add_space_info_type() returns error -> kfree(sub_group) Thus, sub_group is freed twice. Keep parent->sub_group[index] = NULL for the failure path, but after btrfs_sysfs_add_space_info_type() has called kobject_put(), let the kobject release callback handle the cleanup. Fixes: f92ee31e031c ("btrfs: introduce btrfs_space_info sub-group") CC: stable@vger.kernel.org # 6.18+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: do not reject a valid running dev-replaceQu Wenruo
[BUG] There is a bug report that a btrfs with running dev-replace got rejected with the following messages: BTRFS error (device sdk1): devid 0 path /dev/sdk1 is registered but not found in chunk tree BTRFS error (device sdk1): remove the above devices or use 'btrfs device scan --forget <dev>' to unregister them before mount BTRFS error (device sdk1): open_ctree failed: -117 [CAUSE] The tree and super block dumps show the fs is completely sane, except one thing, there is no dev item for devid 0 in chunk tree. However this is not a bug, as we do not insert dev item for devid 0 in the first place. Since the devid 0 is only there temporarily we do not really need to insert a dev item for it and then later remove it again. It is the commit 34308187395f ("btrfs: add extra device item checks at mount") adding a overly strict check that triggers a false alert and rejected the valid filesystem. [FIX] Add a special handling for devid 0, and doesn't require devid 0 to have a device item in chunk tree. Reported-by: Jaron Viëtor <jaron@vietors.com> Link: https://lore.kernel.org/linux-btrfs/CAF1bhLVYLZvD=j2XyuxXDKD-NWNJAwDnpVN+UYeQW-HbzNRn1A@mail.gmail.com/ Fixes: 34308187395f ("btrfs: add extra device item checks at mount") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: only invalidate btree inode pages after all ebs are releasedQu Wenruo
In close_ctree(), we call invalidate_inode_pages2() to invalidate all pages from btree inode. But the problem is, it never returns 0, but always -EBUSY. The problem is that we are still holding all the essential tree root nodes, thus pages holding those tree blocks can not be invalidated thus invalidate_inode_pages2() always returns -EBUSY. This is also against the error cleanup path of open_ctree(), which properly frees all root pointers before calling invalidate_inode_pages(). So fix the order by delaying invalidate_inode_pages2() until we have freed all root pointers. Reviewed-by: Anand Jain <asj@kernel.org> Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: prevent direct reclaim during compressed readaheadJP Kobryn (Meta)
Under memory pressure, direct reclaim can kick in during compressed readahead. This puts the associated task into D-state. Then shrink_lruvec() disables interrupts when acquiring the LRU lock. Under heavy pressure, we've observed reclaim can run long enough that the CPU becomes prone to CSD lock stalls since it cannot service incoming IPIs. Although the CSD lock stalls are the worst case scenario, we have found many more subtle occurrences of this latency on the order of seconds, over a minute in some cases. Prevent direct reclaim during compressed readahead. This is achieved by using different GFP flags at key points when the bio is marked for readahead. There are two functions that allocate during compressed readahead: btrfs_alloc_compr_folio() and add_ra_bio_pages(). Both currently use GFP_NOFS which includes __GFP_DIRECT_RECLAIM. For the internal API call btrfs_alloc_compr_folio(), the signature changes to accept an additional gfp_t parameter. At the readahead call site, it gets flags similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM. __GFP_NOWARN is added since these allocations are allowed to fail. Demand reads still use full GFP_NOFS and will enter reclaim if needed. All other existing call sites of btrfs_alloc_compr_folio() now explicitly pass GFP_NOFS to retain their current behavior. add_ra_bio_pages() gains a bool parameter which allows callers to specify if they want to allow direct reclaim or not. In either case, the __GFP_NOWARN flag was added unconditionally since the allocations are speculative. There has been some previous work done on calling add_ra_bio_pages() [0]. This patch is complementary: where that patch reduces call frequency, this patch reduces the latency associated with those calls. [0] https://lore.kernel.org/linux-btrfs/656838ec1232314a2657716e59f4f15a8eadba64.1751492111.git.boris@bur.io/ Reviewed-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: replace BUG_ON() with error return in cache_save_setup()Teng Liu
In cache_save_setup(), if create_free_space_inode() succeeds but the subsequent lookup_free_space_inode() still fails on retry, the BUG_ON(retries) will crash the kernel. This can happen due to I/O errors or transient failures, not just programming bugs. Replace the BUG_ON with proper error handling that returns the original error code through the existing cleanup path. The callers already handle this gracefully: disk_cache_state defaults to BTRFS_DC_ERROR, so the space cache simply won't be written for that block group. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Teng Liu <27rabbitlt@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: zstd: don't cache sectorsize in a local variableDavid Sterba
The sectorsize is used once or at most twice in the callbacks, no need to cache it on stack. Minor effect on zstd_compress_folios() where it saves 8 bytes of stack. Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: zlib: don't cache sectorsize in a local variableDavid Sterba
The sectorsize is used once or at most twice in the callbacks, no need to cache it on stack. Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: zlib: drop redundant folio address variableDavid Sterba
We're caching the current output folio address but it's not really necessary as we store it in the variable and then pass it to the stream context. We can read the folio address directly. Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: lzo: inline read/write length helpersDavid Sterba
The LZO_LEN read/write helpers are supposed to be trivial and we're duplicating the put/get unaligned helpers so use them directly. Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: use common eb range validation in read_extent_buffer_to_user_nofault()David Sterba
The extent buffer access is checked in other helpers by check_eb_range(), which validates the requested start, length against the extent buffer. While this almost never fails we should still handle it as an error and not just warn. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: read eb folio index right before loopsDavid Sterba
There are generic helpers to access extent buffer folio data of any length, potentially iterating over a few of them. This is a slow path, either we use the type based accessors or the eb folio allocation is contiguous and we can use the memcpy/memcmp helpers. The initialization of 'i' is done at the beginning though it may not be needed. Move it right before the folio loop, this has minor effect on generated code in __write_extent_buffer(). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: rename local variable for offset in folioDavid Sterba
Use proper abbreviation of the 'offset in folio' in the variable name, same as we have in accessors.c. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: unify types for binary search variablesDavid Sterba
The variables calculating where to jump next are using mixed in types which requires some conversions on the instruction level. Using 'u32' removes one call to 'movslq', making the main loop shorter. This complements type conversion done in a724f313f84beb ("btrfs: do unsigned integer division in the extent buffer binary search loop") Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: remove duplicate calculation of eb offset in btrfs_bin_search()David Sterba
In the main search loop the variable 'oil' (offset in folio) is set twice, one duplicated when the key fits completely to the contiguous range. We can remove it and while it's just a simple calculation, the binary search loop is executed many times so micro optimizations add up. The code size is reduced by 64 bytes on release config, the loop is reorganized a bit and a few instructions shorter. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: tree-checker: add remap-tree checks to check_block_group_item()Mark Harmstone
Add some write-time checks for block group items relating to the remap tree. Here we're checking: * That the REMAPPED or METADATA_REMAP flags aren't set unless the REMAP_TREE incompat flag is also set * That `remap_bytes` isn't more than the size of the block group * That `identity_remap_count` isn't more than the number of sectors in the block group Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: make btrfs_free_log() and btrfs_free_log_root_tree() return voidFilipe Manana
These functions never fail, always return success (0) and none of the callers care about their return values. Change their return type from int to void. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: fix deadlock between reflink and transaction commit when using ↵Filipe Manana
flushoncommit When using the flushoncommit mount option, we can have a deadlock between a transaction commit and a reflink operation that copied an inline extent to an offset beyond the current i_size of the destination node. The deadlock happens like this: 1) Task A clones an inline extent from inode X to an offset of inode Y that is beyond Y's current i_size. This means we copied the inline extent's data to a folio of inode Y that is beyond its EOF, using a call to copy_inline_to_page(); 2) Task B starts a transaction commit and calls btrfs_start_delalloc_flush() to flush delalloc; 3) The delalloc flushing sees the new dirty folio of inode Y and when it attempts to flush it, it ends up at extent_writepage() and sees that the offset of the folio is beyond the i_size of inode Y, so it attempts to invalidate the folio by calling folio_invalidate(), which ends up at btrfs' folio invalidate callback - btrfs_invalidate_folio(). There it tries to lock the folio's range in inode Y's extent io tree, but it blocks since it's currently locked by task A - during a reflink we lock the inodes and the source and destination ranges after flushing all delalloc and waiting for ordered extent completion - after that we don't expect to have dirty folios in the ranges, the exception is if we have to copy an inline extent's data (because the destination offset is not zero); 4) Task A then attempts to start a transaction to update the inode item, and then it's blocked since the current transaction is in the TRANS_STATE_COMMIT_START state. Therefore task A has to wait for the current transaction to become unblocked (its state >= TRANS_STATE_UNBLOCKED). So task A is waiting for the transaction commit done by task B, and the later waiting on the extent lock of inode Y that is currently held by task A. Syzbot recently reported this with the following stack traces: INFO: task kworker/u8:7:1053 blocked for more than 143 seconds. Not tainted syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u8:7 state:D stack:23520 pid:1053 tgid:1053 ppid:2 task_flags:0x4208060 flags:0x00080000 Workqueue: writeback wb_workfn (flush-btrfs-46) Call Trace: <TASK> context_switch kernel/sched/core.c:5298 [inline] __schedule+0x1553/0x5240 kernel/sched/core.c:6911 __schedule_loop kernel/sched/core.c:6993 [inline] schedule+0x164/0x360 kernel/sched/core.c:7008 wait_extent_bit fs/btrfs/extent-io-tree.c:811 [inline] btrfs_lock_extent_bits+0x59c/0x700 fs/btrfs/extent-io-tree.c:1914 btrfs_lock_extent fs/btrfs/extent-io-tree.h:152 [inline] btrfs_invalidate_folio+0x43d/0xc40 fs/btrfs/inode.c:7704 extent_writepage fs/btrfs/extent_io.c:1852 [inline] extent_write_cache_pages fs/btrfs/extent_io.c:2580 [inline] btrfs_writepages+0x12ff/0x2440 fs/btrfs/extent_io.c:2713 do_writepages+0x32e/0x550 mm/page-writeback.c:2554 __writeback_single_inode+0x133/0x11a0 fs/fs-writeback.c:1750 writeback_sb_inodes+0x995/0x19d0 fs/fs-writeback.c:2042 wb_writeback+0x456/0xb70 fs/fs-writeback.c:2227 wb_do_writeback fs/fs-writeback.c:2374 [inline] wb_workfn+0x41a/0xf60 fs/fs-writeback.c:2414 process_one_work kernel/workqueue.c:3276 [inline] process_scheduled_works+0xb6e/0x18c0 kernel/workqueue.c:3359 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3440 kthread+0x388/0x470 kernel/kthread.c:436 ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> INFO: task syz.4.64:6910 blocked for more than 143 seconds. Not tainted syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz.4.64 state:D stack:22752 pid:6910 tgid:6905 ppid:5944 task_flags:0x400140 flags:0x00080002 Call Trace: <TASK> context_switch kernel/sched/core.c:5298 [inline] __schedule+0x1553/0x5240 kernel/sched/core.c:6911 __schedule_loop kernel/sched/core.c:6993 [inline] schedule+0x164/0x360 kernel/sched/core.c:7008 wait_current_trans+0x39f/0x590 fs/btrfs/transaction.c:535 start_transaction+0x6a7/0x1650 fs/btrfs/transaction.c:705 clone_copy_inline_extent fs/btrfs/reflink.c:299 [inline] btrfs_clone+0x128a/0x24d0 fs/btrfs/reflink.c:529 btrfs_clone_files+0x271/0x3f0 fs/btrfs/reflink.c:750 btrfs_remap_file_range+0x76b/0x1320 fs/btrfs/reflink.c:903 vfs_copy_file_range+0xda7/0x1390 fs/read_write.c:1600 __do_sys_copy_file_range fs/read_write.c:1683 [inline] __se_sys_copy_file_range+0x2fb/0x480 fs/read_write.c:1650 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f73afc799 RSP: 002b:00007f5f7315e028 EFLAGS: 00000246 ORIG_RAX: 0000000000000146 RAX: ffffffffffffffda RBX: 00007f5f73d75fa0 RCX: 00007f5f73afc799 RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000005 RBP: 00007f5f73b92c99 R08: 0000000000000863 R09: 0000000000000000 R10: 00002000000000c0 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f5f73d76038 R14: 00007f5f73d75fa0 R15: 00007fff138a5068 </TASK> INFO: task syz.4.64:6975 blocked for more than 143 seconds. Not tainted syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz.4.64 state:D stack:24736 pid:6975 tgid:6905 ppid:5944 task_flags:0x400040 flags:0x00080002 Call Trace: <TASK> context_switch kernel/sched/core.c:5298 [inline] __schedule+0x1553/0x5240 kernel/sched/core.c:6911 __schedule_loop kernel/sched/core.c:6993 [inline] schedule+0x164/0x360 kernel/sched/core.c:7008 wb_wait_for_completion+0x3e8/0x790 fs/fs-writeback.c:227 __writeback_inodes_sb_nr+0x24c/0x2d0 fs/fs-writeback.c:2838 try_to_writeback_inodes_sb+0x9a/0xc0 fs/fs-writeback.c:2886 btrfs_start_delalloc_flush fs/btrfs/transaction.c:2175 [inline] btrfs_commit_transaction+0x82e/0x31a0 fs/btrfs/transaction.c:2364 btrfs_ioctl+0xca7/0xd00 fs/btrfs/ioctl.c:5206 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xff/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f73afc799 RSP: 002b:00007f5f7313d028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f5f73d76090 RCX: 00007f5f73afc799 RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000004 RBP: 00007f5f73b92c99 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f5f73d76128 R14: 00007f5f73d76090 R15: 00007fff138a5068 </TASK> Fix this by updating the i_size of the destination inode of a reflink operation after we copy an inline extent's data to an offset beyond the i_size and before attempting to start a transaction to update the inode's item. Reported-by: syzbot+63056bf627663701bbbf@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/69bba3fe.050a0220.227207.002f.GAE@google.com/ Fixes: 05a5a7621ce6 ("Btrfs: implement full reflink support for inline extents") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: tree-checker: check remap-tree flags in btrfs_check_chunk_valid()Mark Harmstone
Add a check to btrfs_check_chunk_valid() that the METADATA_REMAP and REMAPPED flags are only set if the REMAP_TREE incompat flag is also set. Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>