summaryrefslogtreecommitdiff
path: root/fs/dlm
AgeCommit message (Collapse)Author
5 daysConvert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 daystreewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-10Merge tag 'locking-core-2026-02-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Lock debugging: - Implement compiler-driven static analysis locking context checking, using the upcoming Clang 22 compiler's context analysis features (Marco Elver) We removed Sparse context analysis support, because prior to removal even a defconfig kernel produced 1,700+ context tracking Sparse warnings, the overwhelming majority of which are false positives. On an allmodconfig kernel the number of false positive context tracking Sparse warnings grows to over 5,200... On the plus side of the balance actual locking bugs found by Sparse context analysis is also rather ... sparse: I found only 3 such commits in the last 3 years. So the rate of false positives and the maintenance overhead is rather high and there appears to be no active policy in place to achieve a zero-warnings baseline to move the annotations & fixers to developers who introduce new code. Clang context analysis is more complete and more aggressive in trying to find bugs, at least in principle. Plus it has a different model to enabling it: it's enabled subsystem by subsystem, which results in zero warnings on all relevant kernel builds (as far as our testing managed to cover it). Which allowed us to enable it by default, similar to other compiler warnings, with the expectation that there are no warnings going forward. This enforces a zero-warnings baseline on clang-22+ builds (Which are still limited in distribution, admittedly) Hopefully the Clang approach can lead to a more maintainable zero-warnings status quo and policy, with more and more subsystems and drivers enabling the feature. Context tracking can be enabled for all kernel code via WARN_CONTEXT_ANALYSIS_ALL=y (default disabled), but this will generate a lot of false positives. ( Having said that, Sparse support could still be added back, if anyone is interested - the removal patch is still relatively straightforward to revert at this stage. ) Rust integration updates: (Alice Ryhl, Fujita Tomonori, Boqun Feng) - Add support for Atomic<i8/i16/bool> and replace most Rust native AtomicBool usages with Atomic<bool> - Clean up LockClassKey and improve its documentation - Add missing Send and Sync trait implementation for SetOnce - Make ARef Unpin as it is supposed to be - Add __rust_helper to a few Rust helpers as a preparation for helper LTO - Inline various lock related functions to avoid additional function calls WW mutexes: - Extend ww_mutex tests and other test-ww_mutex updates (John Stultz) Misc fixes and cleanups: - rcu: Mark lockdep_assert_rcu_helper() __always_inline (Arnd Bergmann) - locking/local_lock: Include more missing headers (Peter Zijlstra) - seqlock: fix scoped_seqlock_read kernel-doc (Randy Dunlap) - rust: sync: Replace `kernel::c_str!` with C-Strings (Tamir Duberstein)" * tag 'locking-core-2026-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (90 commits) locking/rwlock: Fix write_trylock_irqsave() with CONFIG_INLINE_WRITE_TRYLOCK rcu: Mark lockdep_assert_rcu_helper() __always_inline compiler-context-analysis: Remove __assume_ctx_lock from initializers tomoyo: Use scoped init guard crypto: Use scoped init guard kcov: Use scoped init guard compiler-context-analysis: Introduce scoped init guards cleanup: Make __DEFINE_LOCK_GUARD handle commas in initializers seqlock: fix scoped_seqlock_read kernel-doc tools: Update context analysis macros in compiler_types.h rust: sync: Replace `kernel::c_str!` with C-Strings rust: sync: Inline various lock related methods rust: helpers: Move #define __rust_helper out of atomic.c rust: wait: Add __rust_helper to helpers rust: time: Add __rust_helper to helpers rust: task: Add __rust_helper to helpers rust: sync: Add __rust_helper to helpers rust: refcount: Add __rust_helper to helpers rust: rcu: Add __rust_helper to helpers rust: processor: Add __rust_helper to helpers ...
2026-02-02dlm: Avoid -Wflex-array-member-not-at-end warningGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Move the conflicting declaration to the end of the corresponding structure. Notice that `struct dlm_message` is a flexible structure, this is a structure that contains a flexible-array member. Fix the following warning: fs/dlm/dlm_internal.h:609:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David Teigland <teigland@redhat.com>
2026-01-20fs/dlm/dir: remove unuse variable count_matchAlex Shi
The variable was never used after introduced. Better to comment it if we want to keep the info. fs/dlm/dir.c:65:26: error: variable 'count_match' set but not used [-Werror,-Wunused-but-set-variable] 65 | unsigned int count = 0, count_match = 0, count_bad = 0, count_add = 0; | ^ 1 error generated. Signed-off-by: Alex Shi <alexs@kernel.org> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2026-01-20dlm: Constify struct configfs_item_operations and configfs_group_operationsChristophe JAILLET
'struct configfs_item_operations' and 'configfs_group_operations' are not modified in this driver. Constifying these structures moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 29436 12952 384 42772 a714 fs/dlm/config.o After: ===== text data bss dec hex filename 30076 12312 384 42772 a714 fs/dlm/config.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2026-01-20fs/dlm: use list_add_tail() instead of open-coding list insertionShaurya Rane
Replace the manual list pointer manipulation in add_ordered_member() with the standard list_add_tail() helper. The original code explicitly updated ->prev and ->next pointers to insert @newlist before @tmp, which is exactly what list_add_tail(newlist, tmp) provides. Using the list macro improves readability, removes a source of potential pointer bugs, and satisfies the existing FIXME requesting conversion to the list helpers. No functional change in the ordering logic for DLM members. Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2026-01-20dlm: validate length in dlm_search_rsb_treeEzrak1e
The len parameter in dlm_dump_rsb_name() is not validated and comes from network messages. When it exceeds DLM_RESNAME_MAXLEN, it can cause out-of-bounds write in dlm_search_rsb_tree(). Add length validation to prevent potential buffer overflow. Signed-off-by: Ezrak1e <ezrakiez@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2026-01-20dlm: fix recovery pending middle conversionAlexander Aring
During a workload involving conversions between lock modes PR and CW, lock recovery can create a "conversion deadlock" state between locks that have been recovered. When this occurs, kernel warning messages are logged, e.g. "dlm: WARN: pending deadlock 1e node 0 2 1bf21" "dlm: receive_rcom_lock_args 2e middle convert gr 3 rq 2 remote 2 1e" After this occurs, the deadlocked conversions both appear on the convert queue of the resource being locked, and the conversion requests do not complete. Outside of recovery, conversions that would produce a deadlock are resolved immediately, and return -EDEADLK. The locks are not placed on the convert queue in the deadlocked state. To fix this problem, an lkb under conversion between PR/CW is rebuilt during recovery on a new master's granted queue, with the currently granted mode, rather than being rebuilt on the new master's convert queue, with the currently granted mode and the newly requested mode. The in-progress convert is then resent to the new master after recovery, so the conversion deadlock will be processed outside of the recovery context and handled as described above. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2026-01-05compiler-context-analysis: Change __cond_acquires to take return valueMarco Elver
While Sparse is oblivious to the return value of conditional acquire functions, Clang's context analysis needs to know the return value which indicates successful acquisition. Add the additional argument, and convert existing uses. Notably, Clang's interpretation of the value merely relates to the use in a later conditional branch, i.e. 1 ==> context lock acquired in branch taken if condition non-zero, and 0 ==> context lock acquired in branch taken if condition is zero. Given the precise value does not matter, introduce symbolic variants to use instead of either 0 or 1, which should be more intuitive. No functional change intended. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251219154418.3592607-10-elver@google.com
2025-11-04net: Convert proto callbacks from sockaddr to sockaddr_unsizedKees Cook
Convert struct proto pre_connect(), connect(), bind(), and bind_add() callback function prototypes from struct sockaddr to struct sockaddr_unsized. This does not change per-implementation use of sockaddr for passing around an arbitrarily sized sockaddr struct. Those will be addressed in future patches. Additionally removes the no longer referenced struct sockaddr from include/net/inet_common.h. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-5-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops connect() callbacks to use sockaddr_unsizedKees Cook
Update all struct proto_ops connect() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops bind() callbacks to use sockaddr_unsizedKees Cook
Update all struct proto_ops bind() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge tag 'dlm-6.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This adds a dlm_release_lockspace() flag to request that node-failure recovery be performed for the node leaving the lockspace. The implementation of this flag requires coordination with userland clustering components. It's been requested for use by GFS2" * tag 'dlm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: check for undefined release_option values dlm: handle release_option as unsigned dlm: move to rinfo for all middle conversion cases dlm: handle invalid lockspace member remove dlm: add new flag DLM_RELEASE_RECOVER for dlm_lockspace_release dlm: add new configfs entry release_recover for lockspace members dlm: add new RELEASE_RECOVER uevent attribute for release_lockspace dlm: use defines for force values in dlm_release_lockspace dlm: check for defined force value in dlm_lockspace_release
2025-09-19fs: WQ_PERCPU added to alloc_workqueue usersMarco Crivellari
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This patch adds a new WQ_PERCPU flag to all the fs subsystem users to explicitly request the use of the per-CPU behavior. Both flags coexist for one release cycle to allow callers to transition their calls. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. All existing users have been updated accordingly. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://lore.kernel.org/20250916082906.77439-4-marco.crivellari@suse.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-14dlm: check for undefined release_option valuesAlexander Aring
Checking on all undefined release_option values to return -EINVAL in case a user is providing them to dlm_release_lockspace(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-08-14dlm: handle release_option as unsignedAlexander Aring
Future patches will introduce a invalid argument check for undefined values. All values for release_option are positive integer values to not check on negative values as well we just change the parameter to unsigned int. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-08-14dlm: move to rinfo for all middle conversion casesAlexander Aring
Since commit f74dacb4c8116 ("dlm: fix recovery of middle conversions") we introduced additional debugging information if we hit the middle conversion by using log_limit(). The DLM log_limit() functionality requires a DLM debug option being enabled. As this case is so rarely and excempt any potential introduced new issue with recovery we switching it to log_rinfo() ad this is ratelimited under normal DLM loglevel. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-08-14dlm: handle invalid lockspace member removeAlexander Aring
Since commit de7b4869b4ecf ("dlm: add new configfs entry release_recover for lockspace members") we are moving lockspace members into a gone list before removing them to get additional removing attributes from configfs. There is still a very unlikely possibility when find_config_node() returns NULL, then for some reason the node wasn't marked as gone but it was removed. We will just handle this case and drop an error to observe if this case can ever happen. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/gfs2/aJ2Ssuh8xlsTutrA@stanley.mountain/T/#u Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-08-12dlm: add new flag DLM_RELEASE_RECOVER for dlm_lockspace_releaseAlexander Aring
When dlm_lockspace_release() is passed DLM_RELEASE_RECOVER, it tells the dlm to handle the release/leave as if the node had failed, i.e. perform recovery steps for a failed node, like recover_slot(). When DLM_RELEASE_RECOVER is set: - dlm_release_lockspace() includes RELEASE_RECOVER=1 in the OFFLINE uevent sent to userspace. - userspace/dlm_controld sends a message to all lockspace members indicating that the subsequent node removal should be handled as if the node had failed. - when dlm_controld on all nodes receives the new message, it sets the release_recover configfs entry to 1 for the node. - when the dlm/kernel next performs recovery and removes the node, it will see that release_recover has been set, and will perform recovery steps for the node as if it had failed, e.g. the recover_slot() callback is called to notify the fs. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-08-12dlm: add new configfs entry release_recover for lockspace membersAlexander Aring
A new configfs entry is added for a lockspace member: /config/dlm/<cluster>/spaces/<space>/nodes/<node>/release_recover release_recover can be set to 1 by userspace (dlm_controld process) prior to removing the lockspace member (rmdir of the <node>). This tells the kernel to handle the removed member as if it had failed, i.e. recovery steps for a failed node should be perfomed, as opposed to the recovery steps for a node doing a controlled leave. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-08-12dlm: add new RELEASE_RECOVER uevent attribute for release_lockspaceAlexander Aring
RELEASE_RECOVER=0|1 is passed to user space as part of the OFFLINE uevent for leaving a lockspace, and used by subsequent patches. RELEASE_RECOVER=1 tells user space that the release_lockspace/leave should be handled by the remaining lockspace members as if the leaving node had failed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-08-12dlm: use defines for force values in dlm_release_lockspaceAlexander Aring
Clarify the use of the force parameter by renaming it to "release_option" and adding defines (with descriptions) for each of the accepted values. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-08-12dlm: check for defined force value in dlm_lockspace_releaseAlexander Aring
Force values over 3 are undefined, so don't treat them as 3. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-06-08treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar
Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-04-30dlm: drop SCTP Kconfig dependencyAlexander Aring
DLM does not use any exported SCTP function. SCTP is registered dynamically as protocol to the kernel and can be used over the right protocol identifiers on the socket api. We drop the SCTP dependency as DLM can also be used with TCP only. Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Heming zhao <heming.zhao@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-04-30dlm: reject SCTP configuration if not enabledAlexander Aring
Reject SCTP dlm configuration if the kernel was never build with SCTP. Currently the only one known user space tool "dlm_controld" will drop an error in the logs and getting stuck. This behaviour should be fixed to deliver an error to the user or fallback to TCP. Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Heming zhao <heming.zhao@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-04-30dlm: use SHUT_RDWR for SCTP shutdownAlexander Aring
Currently SCTP shutdown() call gets stuck because there is no incoming EOF indicator on its socket. On the peer side the EOF indicator as recvmsg() returns 0 will be triggered as mechanism to flush the socket queue on the receive side. In SCTP recvmsg() function sctp_recvmsg() we can see that only if sk_shutdown has the bit RCV_SHUTDOWN set SCTP will recvmsg() will return EOF. The RCV_SHUTDOWN bit will only be set when shutdown with SHUT_RD is called. We use now SHUT_RDWR to also get a EOF indicator from recvmsg() call on the shutdown() initiator. SCTP does not support half closed sockets and the semantic of SHUT_WR is different here, it seems that calling SHUT_WR on sctp sockets keeps the socket open to have the possibility to do some specific SCTP operations on it that we don't do here. There exists still a difference in the limitations of TCP vs SCTP in case if we are required to have a half closed socket functionality. This was tried to archieve with DLM protocol changes in the past and hopefully we really don't require half closed socket functionality. Signed-off-by: Alexander Aring <aahringo@redhat.com> Tested-by: Heming zhao <heming.zhao@suse.com> Reviewed-by: Heming zhao <heming.zhao@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-04-30dlm: mask sk_shutdown valueAlexander Aring
The sk->sk_shutdown value is flag value so use masking to check if RCV_SHUTDOWN is set as other possible values like SEND_SHUTDOWN can set as well. Signed-off-by: Alexander Aring <aahringo@redhat.com> Tested-by: Heming zhao <heming.zhao@suse.com> Reviewed-by: Heming zhao <heming.zhao@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-03-18dlm: make tcp still work in multi-link envHeming Zhao
This patch bypasses multi-link errors in TCP mode, allowing dlm to operate on the first tcp link. Signed-off-by: Heming Zhao <heming.zhao@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-02-28dlm: fix error if active rsb is not hashedAlexander Aring
If an active rsb is not hashed anymore and this could occur because we releases and acquired locks we need to signal the followed code that the lookup failed. Since the lookup was successful, but it isn't part of the rsb hash anymore we need to signal it by setting error to -EBADR as dlm_search_rsb_tree() does it. Cc: stable@vger.kernel.org Fixes: 5be323b0c64d ("dlm: move dlm_search_rsb_tree() out of lock") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-02-28dlm: fix error if inactive rsb is not hashedAlexander Aring
If an inactive rsb is not hashed anymore and this could occur because we releases and acquired locks we need to signal the followed code that the lookup failed. Since the lookup was successful, but it isn't part of the rsb hash anymore we need to signal it by setting error to -EBADR as dlm_search_rsb_tree() does it. Cc: stable@vger.kernel.org Fixes: 01fdeca1cc2d ("dlm: use rcu to avoid an extra rsb struct lookup") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-02-10dlm: prevent NPD when writing a positive value to event_doneThadeu Lima de Souza Cascardo
do_uevent returns the value written to event_done. In case it is a positive value, new_lockspace would undo all the work, and lockspace would not be set. __dlm_new_lockspace, however, would treat that positive value as a success due to commit 8511a2728ab8 ("dlm: fix use count with multiple joins"). Down the line, device_create_lockspace would pass that NULL lockspace to dlm_find_lockspace_local, leading to a NULL pointer dereference. Treating such positive values as successes prevents the problem. Given this has been broken for so long, this is unlikely to break userspace expectations. Fixes: 8511a2728ab8 ("dlm: fix use count with multiple joins") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-02-10dlm: increase max number of links for corosync3/knetHeming Zhao
This patch increases the maximum number of links that can be used with corosync3/knet. The majority of the changes are in user space dlm_tools/dlm_controld. Signed-off-by: Heming Zhao <heming.zhao@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-12-19dlm: return -ENOENT if no comm was foundAlexander Aring
Currently if no comm can be found dlm_comm_seq() returns -EEXIST which means entry already exists for a lookup it makes no sense to return -EEXIST. We change it to -ENOENT. There is no user that will evaluate the return value on a specific value so this should be fine. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-12-19dlm: fix srcu_read_lock() return type to intAlexander Aring
The return type of srcu_read_lock() is int and not bool. Whereas we using the ret variable only to evaluate a bool type of dlm_lowcomms_con_has_addr() to check if an address is already being set. Fixes: 6f0b0b5d7ae7 ("fs: dlm: remove dlm_node_addrs lookup list") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-12-19dlm: fix removal of rsb struct that is master and dir recordAlexander Aring
An rsb struct was not being removed in the case where it was both the master and the dir record. This case (master and dir node) was missed in the condition for doing add_scan() from deactivate_rsb(). Fixing this triggers a related WARN_ON that needs to be fixed, and requires adjusting where two del_scan() calls are made. Fixes: c217adfc8caa ("dlm: fix add_scan and del_scan usage") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-11-18dlm: fix dlm_recover_members refcount on errorAlexander Aring
If dlm_recover_members() fails we don't drop the references of the previous created root_list that holds and keep all rsbs alive during the recovery. It might be not an unlikely event because ping_members() could run into an -EINTR if another recovery progress was triggered again. Fixes: 3a747f4a2ee8 ("dlm: move rsb root_list to ls_recover() stack") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-11-15dlm: fix recovery of middle conversionsAlexander Aring
In one special case, recovery is unable to reliably rebuild lock state by simply recreating lkb structs as sent from the lock holders. That case is when the lkb's include conversions between PR and CW modes. The recovery code has always recognized this special case, but the implemention has always been broken, and would set invalid modes in recovered lkb's. Unpredictable or bogus errors could then be returned for further locking calls on these locks. This bug has gone unnoticed for so long due to some combination of: - applications never or infrequently converting between PR/CW - recovery not occuring during these conversions - if the recovery bug does occur, the caller may not notice, depending on what further locking calls are made, e.g. if the lock is simply unlocked it may go unnoticed However, a core analysis from a recent gfs2 bug report points to this broken code. PR = Protected Read CW = Concurrent Write PR and CW are incompatible PR and PR are compatible CW and CW are compatible Example 1 node C, resource R granted: PR node A granted: PR node B granted: NL node C granted: NL node D - A sends convert PR->CW to C - C fails before A gets a reply - recovery occurs At this point, A does not know if it still holds the lock in PR, or if its conversion to CW was granted: - If A's conversion to CW was granted, then another node's CW lock may also have been granted. - If A's conversion to CW was not granted, it still holds a PR lock, and other nodes may also hold PR locks. So, the new master of R cannot simply recreate the lock from A using granted mode PR and requested mode CW. The new master must look at all the recovered locks to determine the correct granted modes, and ensure that all the recovered locks are recreated in compatible states. The correct lock recovery steps in this example are: - node D becomes the new master of R - node B sends D its lkb, granted PR - node A sends D its lkb, convert PR->CW - D determines the correct lock state is: granted: PR node B convert: PR->CW node A The lkb sent by each node was recreated without any change on the new master node. Example 2 node C, resource R granted: PR node A granted: NL node C granted: NL node D waiting: CW node B - A sends convert PR->CW to C - C grants the conversion to CW for A - C grants the waiting request for CW to B - C sends granted message to B, but fails before it can send the granted message to A - B receives the granted message from C At this point: - A believes it is converting PR->CW - B believes it is holding a CW lock The correct lock recovery steps in this example are: - node D becomes the new master of R - node A sends D its lkb, convert PR->CW - node B sends D its lkb, granted CW - D determins the correct lock state is: granted: CW node B granted: CW node A The lkb sent by B is recreated without change, but the lkb sent by A is changed because the granted mode was not compatible. Fixes to make this work correctly: recover_convert_waiter: should not make any changes to a converting lkb that is still waiting for a reply message. It was previously setting grmode to IV, which is invalid state, so the lkb would not be handled correctly by other code. receive_rcom_lock_args: was checking the wrong lkb field (wait_type instead of status) to determine if the lkb is being converted, and in need of inspection for this special recovery. It was also setting grmode to IV in the lkb, causing it to be mishandled by other code. Now, this function just puts the lkb, directly as sent, onto the convert queue of the resource being recovered, and corrects it in recover_conversion() later, if needed. recover_conversion: the job of this function is to detect and correct lkb states for the special PR/CW conversions. The new code now checks for recovered lkbs on the granted queue with grmode PR or CW, and takes the real grmode from that. Then it looks for lkbs on the convert queue with an incompatible grmode (i.e. grmode PR when the real grmode is CW, or v.v.) These converting lkbs need to be fixed. They are fixed by temporarily setting their grmode to NL, so that grmodes are not incompatible and won't confuse other locking code. The converting lkb will then be granted at the end of recovery, replacing the temporary NL grmode. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04dlm: make add_to_waiters() that it can't failAlexander Aring
If add_to_waiters() fails we have a problem because the previous called functions such as validate_lock_args() or validate_unlock_args() sets specific lkb values that are set for a request, there exists no way back to revert those changes. When there is a pending lock request the original request arguments will be overwritten with unknown consequences. The good news are that I believe those cases that we fail in add_to_waiters() can't happen or very unlikely to happen (only if the DLM user does stupid API things), but if so we have the above mentioned problem. There are two conditions that will be removed here. The first one is the -EINVAL case which contains is_overlap_unlock() or (is_overlap_cancel() and mstype == DLM_MSG_CANCEL). The is_overlap_unlock() is missing for the normal UNLOCK case which is moved to validate_unlock_args(). The is_overlap_cancel() already happens in validate_unlock_args() when DLM_LKF_CANCEL is set. In case of validate_lock_args() we check on is_overlap() when it is not a new request, on a new request the lkb is always new and does not have those values set. The -EBUSY check can't happen in case as for non new lock requests (when DLM_LKF_CONVERT is set) we already check in validate_lock_args() for lkb_wait_type and is_overlap(). Then there is only validate_unlock_args() that will never hit the default case because dlm_unlock() will produce DLM_MSG_UNLOCK and DLM_MSG_CANCEL messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04dlm: dlm_config_info config fields to unsigned intAlexander Aring
We are using kstrtouint() to parse common integer fields. This patch will switch to use unsigned int instead of int as we are parsing unsigned integer values. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04dlm: use dlm_config as only cluster configurationAlexander Aring
This patch removes the configfs storage fields from the dlm_cluster structure to store per cluster values. Those fields also exists for the dlm_config global variable and get stored in both when setting configfs values. To read values it will always be read out from the dlm_cluster configfs structure but this patch changes it to only use the global dlm_config variable. Storing them in two places makes no sense as both are able to be changed under certain conditions during DLM runtime. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04dlm: handle port as __be16 network byte orderAlexander Aring
This patch handles the DLM listen port setting internally as byte order as it is a value that is used as network byte on the wire. The user space still sets this value as host byte order for configfs as we don't break UAPI here. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04dlm: disallow different configs nodeid storagesAlexander Aring
The DLM configfs path has usually a nodeid in it's directory path and again a file to store the nodeid again in a separate storage. It is forced that the user space will set both (the directory name and nodeid file) storage to the same value if it doesn't do that we run in some kind of broken state. This patch will simply represent the file storage to it's upper directory nodeid name. It will force the user now to use a valid unsigned int as nodeid directory name and will ignore all nodeid writes in the nodeid file storage as this will now always represent the upper nodeid directory name. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04dlm: fix possible lkb_resource null dereferenceAlexander Aring
This patch fixes a possible null pointer dereference when this function is called from request_lock() as lkb->lkb_resource is not assigned yet, only after validate_lock_args() by calling attach_lkb(). Another issue is that a resource name could be a non printable bytearray and we cannot assume to be ASCII coded. The log functionality is probably never being hit when DLM is used in normal way and no debug logging is enabled. The null pointer dereference can only occur on a new created lkb that does not have the resource assigned yet, it probably never hits the null pointer dereference but we should be sure that other changes might not change this behaviour and we actually can hit the mentioned null pointer dereference. In this patch we just drop the printout of the resource name, the lkb id is enough to make a possible connection to a resource name if this exists. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04dlm: fix swapped args sb_flags vs sb_statusAlexander Aring
The arguments got swapped by commit 986ae3c2a8df ("dlm: fix race between final callback and remove") fixing this now. Fixes: 986ae3c2a8df ("dlm: fix race between final callback and remove") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-09-27[tree-wide] finally take no_llseek outAl Viro
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-08-12dlm: add missing -ENOMEM if alloc_workqueue() failsAlexander Aring
This patch sets an missing -ENOMEM as error return value when the allocation of the dlm workqueue fails. Fixes: 94e180d6255f ("dlm: async freeing of lockspace resources") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202408110800.OsoP8TB9-lkp@intel.com/ Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08dlm: do synchronized socket connect callAlexander Aring
To avoid -EINPROGRESS cases on connect that just ends in a retry we just call connect in a synchronized way to wait until its done. Since commit dbb751ffab0b ("fs: dlm: parallelize lowcomms socket handling") we have a non ordered workqueue running for serving the DLM sockets that allows us to call send/recv for each DLM socket connection in parallel. Before each worker needed to wait until the previous worker was done and probably the reason why connect() was called in an asynchronous way to not block other workers. This is however not necessary anymore as other socket handling workers don't need to wait. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08dlm: move lkb xarray lookup out of lockAlexander Aring
This patch moves the xarray lookup functionality for the lkb out of the ls_lkbxa_lock read lock handling. We can do that as the xarray should be possible to access lockless in case of reader like xa_load(). We confirm under ls_lkbxa_lock that the lkb is still part of the data structure and take a reference when its still part of ls_lkbxa to avoid being freed after doing the lookup. To do a check if the lkb is still part of the ls_lkbxa data structure we use a kref_read() as the last put will remove it from the ls_lkbxa data structure and any reference taken means it is still part of ls_lkbxa. A similar approach was done with the DLM rsb rhashtable just with a flag instead of the refcounter because the refcounter has a slightly different meaning. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>