Merge series "Extend regulator notification support" from Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>:

Extend regulator notification support This series extends the regulator notification and error flag support. Initial discussion on the topic can be found here: https://lore.kernel.org/lkml/6046836e22b8252983f08d5621c35ececb97820d.camel@fi.rohmeurope.com/ In a nutshell - the series adds: 1. WARNING level events/error flags. (Patch 3) Current regulator 'ERROR' event notifications for over/under voltage, over current and over temperature are used to indicate condition where monitored entity is so badly "off" that it actually indicates a hardware error which can not be recovered. The most typical hanling for that is believed to be a (graceful) system-shutdown. Here we add set of 'WARNING' level flags to allow sending notifications to consumers before things are 'that badly off' so that consumer drivers can implement recovery-actions. 2. Device-tree properties for specifying limit values. (Patches 1, 5) Add limits for above mentioned 'ERROR' and 'WARNING' levels (which send notifications to consumers) and also for a 'PROTECTION' level (which will be used to immediately shut-down the regulator(s) W/O informing consumer drivers. Typically implemented by hardware). Property parsing is implemented in regulator core which then calls callback operations for limit setting from the IC drivers. A warning is emitted if protection is requested by device tree but the underlying IC does not support configuring requested protection. 3. Helpers which can be registered by IC. (Patch 4) Target is to avoid implementing IRQ handling and IRQ storm protection in each IC driver. (Many of the ICs implementin these IRQs do not allow masking or acking the IRQ but keep the IRQ asserted for the whole duration of problem keeping the processor in IRQ handling loop). 4. Emergency poweroff function (refactored out of the thermal_core to kernel/reboot.c) which is called if IC fires error IRQs but IC reading fails and given retry-count is exceeded. (Patches 2, 4) Please note that the mutex in the emergency shutdown was replaced by a simple atomic in order to allow call from any context. The helper was attempted to be done so it could be used to implement roughly same logic as is used in qcom-labibb regulator. This means amongst other things a safety shut-down if IC registers are not readable. Using these shut-down retry counters are optional. The idea is that the helper could be also used by simpler ICs which do not provide status register(s) which can be used to check if error is still active. ICs which do not have such status register can simply omit the 'renable' callback (and retry-counts etc) - and helper assumes the situation is Ok and re-enables IRQ after given time period. If problem persists the handler is ran again and another notification is sent - but at least the delay allows processor to avoid IRQ loop. Patch 7 takes this notification support in use at BD9576MUF. Patch 8 is related to MFD change which is not really related to the RFC here. It was added to this series in order to avoid potential conflicts. Patch 9 adds a maintainers entry. Changelog v10-RESEND: - rebased on v5.13-rc4 Changelog v10: - rebased on v5.13-rc2 - Move rdev_*() print macros to the internal.h and use rdev_dbg() from irq_helpers.c - Export rdev_get_name() and move it from coupler.h to driver.h for others to use. (It was already in coupler.h but not exported - usage was limited and coupler.h does not sound like optimal place as rdev_name is not only used by coupled regulators) - Send all regulator notifications from irq_helpers.c at one OR'd event for the sake of simplicity. For BD9576 this does not matter as it has own IRQ for each event case. Header defining events says they may be OR'd. - Change WARN() at protection shutdown to pr_emerg as suggested by Petr. Changelog v9: - rebases on v5.13-rc1 - Update thermal documentation - Fix regulator notification event number Changelog v8: - split shutdown API adding and thermal core taking it in use to own patches. - replace the spinlock with atomic when ensuring the emergency shutdown is only called once. Changelog v7: general: - rebased on v5.12-rc7 - new patch for refactoring the hw-failure reboot logic out of thermal_core.c for others to use. notification helpers: - fix regulator error_flags query - grammar/typos - do not BUG() but attempt to shut-down the system - use BITS_PER_TYPE() Changelog v6: Add MAINTAINERS entry Changes to IRQ notifiers - move devm functions to drivers/regulator/devres.c - drop irq validity check - use devm_add_action_or_reset() - fix styling issues - fix kerneldocs Changelog v5: - Fix the badly formatted pr_emerg() call. Changelog v4: - rebased on v5.12-rc6 - dropped RFC - fix external FET DT-binding. - improve prints for cases when expecting HW failure. - styling and typos Changelog v3: Regulator core: - Fix dangling pointer access at regulator_irq_helper() stpmic1_regulator: - fix function prototype (compile error) bd9576-regulator: - Update over current limits to what was given in new data-sheet (REV00K) - Allow over-current monitoring without external FET. Set limits to values given in data-sheet (REV00K). Changelog v2: Generic: - rebase on v5.12-rc2 + BD9576 series - Split devm variant of delayed wq to own series Regulator framework: - Provide non devm variant of IRQ notification helpers - shorten dt-property names as suggested by Rob - unconditionally call map_event in IRQ handling and require it to be populated BD9576 regulators: - change the FET resistance property to micro-ohms - fix voltage computation in OC limit setting
author: Mark Brown <broonie@kernel.org> 2021-06-21 19:28:42 +0100
committer: Mark Brown <broonie@kernel.org> 2021-06-21 19:28:42 +0100
commit: 9d598cd737d15b5770c5bddf35a512f7ab07b78b (patch)
tree: d0ec62945d9fb27aa1ac7231247c5c1f42813d12 /mm
parent: 57c045bc727001c43b6a65adb0418aa7b3e6dbd0 (diff)
parent: d55444adedaee5a3024c61637032057fcf38491b (diff)
10 files changed, 48 insertions, 72 deletions
diff --git a/mm/gup.c b/mm/gup.c
index 0697134b6a12..3ded6a5f26b2 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1593,10 +1593,6 @@ struct page *get_dump_page(unsigned long addr)
 				      FOLL_FORCE | FOLL_DUMP | FOLL_GET);
 	if (locked)
 		mmap_read_unlock(mm);
-
-	if (ret == 1 && is_page_poisoned(page))
-		return NULL;
-
 	return (ret == 1) ? page : NULL;
 }
 #endif /* CONFIG_ELF_CORE */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3db405dea3dc..95918f410c0f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4056,6 +4056,7 @@ again:
 				 * See Documentation/vm/mmu_notifier.rst
 				 */
 				huge_ptep_set_wrprotect(src, addr, src_pte);
+				entry = huge_pte_wrprotect(entry);
 			}
 
 			page_dup_rmap(ptepage, true);
diff --git a/mm/internal.h b/mm/internal.h
index 54bd0dc2c23c..2f1182948aa6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -96,26 +96,6 @@ static inline void set_page_refcounted(struct page *page)
 	set_page_count(page, 1);
 }
 
-/*
- * When kernel touch the user page, the user page may be have been marked
- * poison but still mapped in user space, if without this page, the kernel
- * can guarantee the data integrity and operation success, the kernel is
- * better to check the posion status and avoid touching it, be good not to
- * panic, coredump for process fatal signal is a sample case matching this
- * scenario. Or if kernel can't guarantee the data integrity, it's better
- * not to call this function, let kernel touch the poison page and get to
- * panic.
- */
-static inline bool is_page_poisoned(struct page *page)
-{
-	if (PageHWPoison(page))
-		return true;
-	else if (PageHuge(page) && PageHWPoison(compound_head(page)))
-		return true;
-
-	return false;
-}
-
 extern unsigned long highest_memmap_pfn;
 
 /*
diff --git a/mm/ioremap.c b/mm/ioremap.c
index d1dcc7e744ac..8ee0136f8cb0 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -16,16 +16,16 @@
 #include "pgalloc-track.h"
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-static bool __ro_after_init iomap_max_page_shift = PAGE_SHIFT;
+static unsigned int __ro_after_init iomap_max_page_shift = BITS_PER_LONG - 1;
 
 static int __init set_nohugeiomap(char *str)
 {
-	iomap_max_page_shift = P4D_SHIFT;
+	iomap_max_page_shift = PAGE_SHIFT;
 	return 0;
 }
 early_param("nohugeiomap", set_nohugeiomap);
 #else /* CONFIG_HAVE_ARCH_HUGE_VMAP */
-static const bool iomap_max_page_shift = PAGE_SHIFT;
+static const unsigned int iomap_max_page_shift = PAGE_SHIFT;
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 int ioremap_page_range(unsigned long addr,
diff --git a/mm/ksm.c b/mm/ksm.c
index 6bbe314c5260..2f3aaeb34a42 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -776,11 +776,12 @@ static void remove_rmap_item_from_tree(struct rmap_item *rmap_item)
 		struct page *page;
 
 		stable_node = rmap_item->head;
-		page = get_ksm_page(stable_node, GET_KSM_PAGE_NOLOCK);
+		page = get_ksm_page(stable_node, GET_KSM_PAGE_LOCK);
 		if (!page)
 			goto out;
 
 		hlist_del(&rmap_item->hlist);
+		unlock_page(page);
 		put_page(page);
 
 		if (!hlist_empty(&stable_node->hlist))
diff --git a/mm/shmem.c b/mm/shmem.c
index a08cedefbfaa..5d46611cba8d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2258,25 +2258,11 @@ out_nomem:
 static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct shmem_inode_info *info = SHMEM_I(file_inode(file));
+	int ret;
 
-	if (info->seals & F_SEAL_FUTURE_WRITE) {
-		/*
-		 * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
-		 * "future write" seal active.
-		 */
-		if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
-			return -EPERM;
-
-		/*
-		 * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
-		 * MAP_SHARED and read-only, take care to not allow mprotect to
-		 * revert protections on such mappings. Do this only for shared
-		 * mappings. For private mappings, don't need to mask
-		 * VM_MAYWRITE as we still want them to be COW-writable.
-		 */
-		if (vma->vm_flags & VM_SHARED)
-			vma->vm_flags &= ~(VM_MAYWRITE);
-	}
+	ret = seal_check_future_write(info->seals, vma);
+	if (ret)
+		return ret;
 
 	/* arm64 - allow memory tagging on RAM-based files */
 	vma->vm_flags |= VM_MTE_ALLOWED;
@@ -2375,8 +2361,18 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 	pgoff_t offset, max_off;
 
 	ret = -ENOMEM;
-	if (!shmem_inode_acct_block(inode, 1))
+	if (!shmem_inode_acct_block(inode, 1)) {
+		/*
+		 * We may have got a page, returned -ENOENT triggering a retry,
+		 * and now we find ourselves with -ENOMEM. Release the page, to
+		 * avoid a BUG_ON in our caller.
+		 */
+		if (unlikely(*pagep)) {
+			put_page(*pagep);
+			*pagep = NULL;
+		}
 		goto out;
+	}
 
 	if (!*pagep) {
 		page = shmem_alloc_page(gfp, info, pgoff);
diff --git a/mm/shuffle.h b/mm/shuffle.h
index 71b784f0b7c3..cec62984f7d3 100644
--- a/mm/shuffle.h
+++ b/mm/shuffle.h
@@ -10,7 +10,7 @@
 DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
 extern void __shuffle_free_memory(pg_data_t *pgdat);
 extern bool shuffle_pick_tail(void);
-static inline void shuffle_free_memory(pg_data_t *pgdat)
+static inline void __meminit shuffle_free_memory(pg_data_t *pgdat)
 {
 	if (!static_branch_unlikely(&page_alloc_shuffle_key))
 		return;
@@ -18,7 +18,7 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
 }
 
 extern void __shuffle_zone(struct zone *z);
-static inline void shuffle_zone(struct zone *z)
+static inline void __meminit shuffle_zone(struct zone *z)
 {
 	if (!static_branch_unlikely(&page_alloc_shuffle_key))
 		return;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f8833d3e5d47..a4a571428c51 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -318,6 +318,16 @@ kmem_cache_create_usercopy(const char *name,
 	const char *cache_name;
 	int err;
 
+#ifdef CONFIG_SLUB_DEBUG
+	/*
+	 * If no slub_debug was enabled globally, the static key is not yet
+	 * enabled by setup_slub_debug(). Enable it if the cache is being
+	 * created with any of the debugging flags passed explicitly.
+	 */
+	if (flags & SLAB_DEBUG_FLAGS)
+		static_branch_enable(&slub_debug_enabled);
+#endif
+
 	mutex_lock(&slab_mutex);
 
 	err = kmem_cache_sanity_check(name, size);
diff --git a/mm/slub.c b/mm/slub.c
index feda53ae62ba..3f96e099817a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -301,6 +301,7 @@ static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
 	if (!debug_pagealloc_enabled_static())
 		return get_freepointer(s, object);
 
+	object = kasan_reset_tag(object);
 	freepointer_addr = (unsigned long)object + s->offset;
 	copy_from_kernel_nofault(&p, (void **)freepointer_addr, sizeof(p));
 	return freelist_ptr(s, p, freepointer_addr);
@@ -3828,15 +3829,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 
 static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 {
-#ifdef CONFIG_SLUB_DEBUG
-	/*
-	 * If no slub_debug was enabled globally, the static key is not yet
-	 * enabled by setup_slub_debug(). Enable it if the cache is being
-	 * created with any of the debugging flags passed explicitly.
-	 */
-	if (flags & SLAB_DEBUG_FLAGS)
-		static_branch_enable(&slub_debug_enabled);
-#endif
 	s->flags = kmem_cache_flags(s->size, flags, s->name);
 #ifdef CONFIG_SLAB_FREELIST_HARDENED
 	s->random = get_random_long();
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index e14b3820c6a8..63a73e164d55 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -360,38 +360,38 @@ out:
 		 * If a reservation for the page existed in the reservation
 		 * map of a private mapping, the map was modified to indicate
 		 * the reservation was consumed when the page was allocated.
-		 * We clear the PagePrivate flag now so that the global
+		 * We clear the HPageRestoreReserve flag now so that the global
 		 * reserve count will not be incremented in free_huge_page.
 		 * The reservation map will still indicate the reservation
 		 * was consumed and possibly prevent later page allocation.
 		 * This is better than leaking a global reservation.  If no
-		 * reservation existed, it is still safe to clear PagePrivate
-		 * as no adjustments to reservation counts were made during
-		 * allocation.
+		 * reservation existed, it is still safe to clear
+		 * HPageRestoreReserve as no adjustments to reservation counts
+		 * were made during allocation.
 		 *
 		 * The reservation map for shared mappings indicates which
 		 * pages have reservations.  When a huge page is allocated
 		 * for an address with a reservation, no change is made to
-		 * the reserve map.  In this case PagePrivate will be set
-		 * to indicate that the global reservation count should be
+		 * the reserve map.  In this case HPageRestoreReserve will be
+		 * set to indicate that the global reservation count should be
 		 * incremented when the page is freed.  This is the desired
 		 * behavior.  However, when a huge page is allocated for an
 		 * address without a reservation a reservation entry is added
-		 * to the reservation map, and PagePrivate will not be set.
-		 * When the page is freed, the global reserve count will NOT
-		 * be incremented and it will appear as though we have leaked
-		 * reserved page.  In this case, set PagePrivate so that the
-		 * global reserve count will be incremented to match the
-		 * reservation map entry which was created.
+		 * to the reservation map, and HPageRestoreReserve will not be
+		 * set. When the page is freed, the global reserve count will
+		 * NOT be incremented and it will appear as though we have
+		 * leaked reserved page.  In this case, set HPageRestoreReserve
+		 * so that the global reserve count will be incremented to
+		 * match the reservation map entry which was created.
 		 *
 		 * Note that vm_alloc_shared is based on the flags of the vma
 		 * for which the page was originally allocated.  dst_vma could
 		 * be different or NULL on error.
 		 */
 		if (vm_alloc_shared)
-			SetPagePrivate(page);
+			SetHPageRestoreReserve(page);
 		else
-			ClearPagePrivate(page);
+			ClearHPageRestoreReserve(page);
 		put_page(page);
 	}
 	BUG_ON(copied < 0);
author	Mark Brown <broonie@kernel.org>	2021-06-21 19:28:42 +0100
committer	Mark Brown <broonie@kernel.org>	2021-06-21 19:28:42 +0100
commit	9d598cd737d15b5770c5bddf35a512f7ab07b78b (patch)
tree	d0ec62945d9fb27aa1ac7231247c5c1f42813d12 /mm
parent	57c045bc727001c43b6a65adb0418aa7b3e6dbd0 (diff)
parent	d55444adedaee5a3024c61637032057fcf38491b (diff)