summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-08-11Merge tag 'drm-misc-next-fixes-2020-08-05' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next-fixes for v5.9-rc1: - Fix drm_dp_mst_port refcount leaks in drm_dp_mst_allocate_vcpi - Fix a fbcon OOB read in fbdev, found by syzbot. - Mark vga_tryget static as it's not used elsewhere. - Small fixes to xlnx. - Remove null check for kfree in drm_dev_release. - Fix DRM_FORMAT_MOD_AMLOGIC_FBC definition. - Fix mode initialization in omap_connector_mode_valid(). Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/b2043dad-f118-bd19-54a6-f23bf6264007@linux.intel.com
2020-08-04fbmem: pull fbcon_update_vcs() out of fb_set_var()Tetsuo Handa
syzbot is reporting OOB read bug in vc_do_resize() [1] caused by memcpy() based on outdated old_{rows,row_size} values, for resize_screen() can recurse into vc_do_resize() which changes vc->vc_{cols,rows} that outdates old_{rows,row_size} values which were saved before calling resize_screen(). Daniel Vetter explained that resize_screen() should not recurse into fbcon_update_vcs() path due to FBINFO_MISC_USEREVENT being still set when calling resize_screen(). Instead of masking FBINFO_MISC_USEREVENT before calling fbcon_update_vcs(), we can remove FBINFO_MISC_USEREVENT by calling fbcon_update_vcs() only if fb_set_var() returned 0. This change assumes that it is harmless to call fbcon_update_vcs() when fb_set_var() returned 0 without reaching fb_notifier_call_chain(). [1] https://syzkaller.appspot.com/bug?id=c70c88cfd16dcf6e1d3c7f0ab8648b3144b5b25e Reported-and-tested-by: syzbot <syzbot+c37a14770d51a085a520@syzkaller.appspotmail.com> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: kernel test robot <lkp@intel.com> for missing #include Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/075b7e37-3278-cd7d-31ab-c5073cfa8e92@i-love.sakura.ne.jp
2020-08-01vgaarb: mark vga_tryget staticChristoph Hellwig
This symbols isn't used anywhere outside of vgaarb.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200801061713.307434-1-hch@lst.de
2020-07-24Merge tag 'drm/tegra/for-5.9-rc1' of ↵Dave Airlie
ssh://git.freedesktop.org/git/tegra/linux into drm-next drm/tegra: Changes for v5.9-rc1 This set of patches contains a few preparatory patches to enable video capture support from external camera modules. This is a dependency for the V4L2 driver patches that will likely be merged in v5.9 or v5.10. On top of that there are a couple of fixes across the board as well as some improvements. From a feature point of view this also adds support for horizontal reflection and 180° rotation of planes. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thierry Reding <thierry.reding@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200717162011.1661788-1-thierry.reding@gmail.com
2020-07-24Merge v5.8-rc6 into drm-nextDave Airlie
I've got a silent conflict + two trees based on fixes to merge. Fixes a silent merge with amdgpu Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-07-23Merge tag 'drm-xilinx-dpsub-20200718' of git://linuxtv.org/pinchartl/media ↵Dave Airlie
into drm-next Xilinx ZynqMP DisplayPort Subsystem driver Signed-off-by: Dave Airlie <airlied@redhat.com> From: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200718001755.GA5962@pendragon.ideasonboard.com
2020-07-21dma-fence: prime lockdep annotationsDaniel Vetter
Two in one go: - it is allowed to call dma_fence_wait() while holding a dma_resv_lock(). This is fundamental to how eviction works with ttm, so required. - it is allowed to call dma_fence_wait() from memory reclaim contexts, specifically from shrinker callbacks (which i915 does), and from mmu notifier callbacks (which amdgpu does, and which i915 sometimes also does, and probably always should, but that's kinda a debate). Also for stuff like HMM we really need to be able to do this, or things get real dicey. Consequence is that any critical path necessary to get to a dma_fence_signal for a fence must never a) call dma_resv_lock nor b) allocate memory with GFP_KERNEL. Also by implication of dma_resv_lock(), no userspace faulting allowed. That's some supremely obnoxious limitations, which is why we need to sprinkle the right annotations to all relevant paths. The one big locking context we're leaving out here is mmu notifiers, added in commit 23b68395c7c78a764e8963fc15a7cfd318bf187f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Aug 26 22:14:21 2019 +0200 mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end that one covers a lot of other callsites, and it's also allowed to wait on dma-fences from mmu notifiers. But there's no ready-made functions exposed to prime this, so I've left it out for now. v2: Also track against mmu notifier context. v3: kerneldoc to spec the cross-driver contract. Note that currently i915 throws in a hard-coded 10s timeout on foreign fences (not sure why that was done, but it's there), which is why that rule is worded with SHOULD instead of MUST. Also some of the mmu_notifier/shrinker rules might surprise SoC drivers, I haven't fully audited them all. Which is infeasible anyway, we'll need to run them with lockdep and dma-fence annotations and see what goes boom. v4: A spelling fix from Mika v5: #ifdef for CONFIG_MMU_NOTIFIER. Reported by 0day. Unfortunately this means lockdep enforcement is slightly inconsistent, it won't spot GFP_NOIO and GFP_NOFS allocations in the wrong spot if CONFIG_MMU_NOTIFIER is disabled in the kernel config. Oh well. v5: Note that only drivers/gpu has a reasonable (or at least historical) excuse to use dma_fence_wait() from shrinker and mmu notifier callbacks. Everyone else should either have a better memory manager model, or better hardware. This reflects discussions with Jason Gunthorpe. Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: kernel test robot <lkp@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> (v4) Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@intel.com> Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200707201229.472834-3-daniel.vetter@ffwll.ch
2020-07-21dma-fence: basic lockdep annotationsDaniel Vetter
Design is similar to the lockdep annotations for workers, but with some twists: - We use a read-lock for the execution/worker/completion side, so that this explicit annotation can be more liberally sprinkled around. With read locks lockdep isn't going to complain if the read-side isn't nested the same way under all circumstances, so ABBA deadlocks are ok. Which they are, since this is an annotation only. - We're using non-recursive lockdep read lock mode, since in recursive read lock mode lockdep does not catch read side hazards. And we _very_ much want read side hazards to be caught. For full details of this limitation see commit e91498589746065e3ae95d9a00b068e525eec34f Author: Peter Zijlstra <peterz@infradead.org> Date: Wed Aug 23 13:13:11 2017 +0200 locking/lockdep/selftests: Add mixed read-write ABBA tests - To allow nesting of the read-side explicit annotations we explicitly keep track of the nesting. lock_is_held() allows us to do that. - The wait-side annotation is a write lock, and entirely done within dma_fence_wait() for everyone by default. - To be able to freely annotate helper functions I want to make it ok to call dma_fence_begin/end_signalling from soft/hardirq context. First attempt was using the hardirq locking context for the write side in lockdep, but this forces all normal spinlocks nested within dma_fence_begin/end_signalling to be spinlocks. That bollocks. The approach now is to simple check in_atomic(), and for these cases entirely rely on the might_sleep() check in dma_fence_wait(). That will catch any wrong nesting against spinlocks from soft/hardirq contexts. The idea here is that every code path that's critical for eventually signalling a dma_fence should be annotated with dma_fence_begin/end_signalling. The annotation ideally starts right after a dma_fence is published (added to a dma_resv, exposed as a sync_file fd, attached to a drm_syncobj fd, or anything else that makes the dma_fence visible to other kernel threads), up to and including the dma_fence_wait(). Examples are irq handlers, the scheduler rt threads, the tail of execbuf (after the corresponding fences are visible), any workers that end up signalling dma_fences and really anything else. Not annotated should be code paths that only complete fences opportunistically as the gpu progresses, like e.g. shrinker/eviction code. The main class of deadlocks this is supposed to catch are: Thread A: mutex_lock(A); mutex_unlock(A); dma_fence_signal(); Thread B: mutex_lock(A); dma_fence_wait(); mutex_unlock(A); Thread B is blocked on A signalling the fence, but A never gets around to that because it cannot acquire the lock A. Note that dma_fence_wait() is allowed to be nested within dma_fence_begin/end_signalling sections. To allow this to happen the read lock needs to be upgraded to a write lock, which means that any other lock is acquired between the dma_fence_begin_signalling() call and the call to dma_fence_wait(), and still held, this will result in an immediate lockdep complaint. The only other option would be to not annotate such calls, defeating the point. Therefore these annotations cannot be sprinkled over the code entirely mindless to avoid false positives. Originally I hope that the cross-release lockdep extensions would alleviate the need for explicit annotations: https://lwn.net/Articles/709849/ But there's a few reasons why that's not an option: - It's not happening in upstream, since it got reverted due to too many false positives: commit e966eaeeb623f09975ef362c2866fae6f86844f9 Author: Ingo Molnar <mingo@kernel.org> Date: Tue Dec 12 12:31:16 2017 +0100 locking/lockdep: Remove the cross-release locking checks This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y), while it found a number of old bugs initially, was also causing too many false positives that caused people to disable lockdep - which is arguably a worse overall outcome. - cross-release uses the complete() call to annotate the end of critical sections, for dma_fence that would be dma_fence_signal(). But we do not want all dma_fence_signal() calls to be treated as critical, since many are opportunistic cleanup of gpu requests. If these get stuck there's still the main completion interrupt and workers who can unblock everyone. Automatically annotating all dma_fence_signal() calls would hence cause false positives. - cross-release had some educated guesses for when a critical section starts, like fresh syscall or fresh work callback. This would again cause false positives without explicit annotations, since for dma_fence the critical sections only starts when we publish a fence. - Furthermore there can be cases where a thread never does a dma_fence_signal, but is still critical for reaching completion of fences. One example would be a scheduler kthread which picks up jobs and pushes them into hardware, where the interrupt handler or another completion thread calls dma_fence_signal(). But if the scheduler thread hangs, then all the fences hang, hence we need to manually annotate it. cross-release aimed to solve this by chaining cross-release dependencies, but the dependency from scheduler thread to the completion interrupt handler goes through hw where cross-release code can't observe it. In short, without manual annotations and careful review of the start and end of critical sections, cross-relese dependency tracking doesn't work. We need explicit annotations. v2: handle soft/hardirq ctx better against write side and dont forget EXPORT_SYMBOL, drivers can't use this otherwise. v3: Kerneldoc. v4: Some spelling fixes from Mika v5: Amend commit message to explain in detail why cross-release isn't the solution. v6: Pull out misplaced .rst hunk. Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Dave Airlie <airlied@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@intel.com> Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200707201229.472834-2-daniel.vetter@ffwll.ch
2020-07-19Merge tag 'sched-urgent-2020-07-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master Pull scheduler fixes from Thomas Gleixner: "A set of scheduler fixes: - Plug a load average accounting race which was introduced with a recent optimization casing load average to show bogus numbers. - Fix the rseq CPU id initialization for new tasks. sched_fork() does not update the rseq CPU id so the id is the stale id of the parent task, which can cause user space data corruption. - Handle a 0 return value of task_h_load() correctly in the load balancer, which does not decrease imbalance and therefore pulls until the maximum number of loops is reached, which might be all tasks just created by a fork bomb" * tag 'sched-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: handle case of task_h_load() returning 0 sched: Fix unreliable rseq cpu_id for new tasks sched: Fix loadavg accounting race
2020-07-19Merge tag 'dma-mapping-5.8-6' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping into master Pull dma-mapping fixes from Christoph Hellwig: "Ensure we always have fully addressable memory in the dma coherent pool (Nicolas Saenz Julienne)" * tag 'dma-mapping-5.8-6' of git://git.infradead.org/users/hch/dma-mapping: dma-pool: do not allocate pool memory from CMA dma-pool: make sure atomic pool suits device dma-pool: introduce dma_guess_pool() dma-pool: get rid of dma_in_atomic_pool() dma-direct: provide function to check physical memory area validity
2020-07-17Merge tag 'fuse-fixes-5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into master Pull fuse fixes from Miklos Szeredi: - two regressions in this cycle caused by the conversion of writepage list to an rb_tree - two regressions in v5.4 cause by the conversion to the new mount API - saner behavior of fsconfig(2) for the reconfigure case - an ancient issue with FS_IOC_{GET,SET}FLAGS ioctls * tag 'fuse-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS fuse: don't ignore errors from fuse_writepages_fill() fuse: clean up condition for writepage sending fuse: reject options on reconfigure via fsconfig(2) fuse: ignore 'data' argument of mount(..., MS_REMOUNT) fuse: use ->reconfigure() instead of ->remount_fs() fuse: fix warning in tree_insert() and clean up writepage insertion fuse: move rb_erase() before tree_insert()
2020-07-17gpu: host1x: mipi: Split tegra_mipi_calibrate() and tegra_mipi_wait()Sowjanya Komatineni
SW can trigger MIPI pads calibration any time after power on but calibration results will be latched and applied to the pads by MIPI CAL unit only when the link is in LP-11 state and then status register will be updated. For CSI, trigger of pads calibration happen during CSI stream enable where CSI receiver is kept ready prior to sensor or CSI transmitter stream start. So, pads may not be in LP-11 at this time and waiting for the calibration to be done immediate after calibration start will result in timeout. This patch splits tegra_mipi_calibrate() and tegra_mipi_wait() so triggering for calibration and waiting for it to complete can happen at different stages. Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2020-07-17gpu: host1x: mipi: Update tegra_mipi_request() to be node basedSowjanya Komatineni
Tegra CSI driver need a separate MIPI device for each channel as calibration of corresponding MIPI pads for each channel should happen independently. So, this patch updates tegra_mipi_request() API to add a device_node pointer argument to allow creating mipi device for specific device node rather than a device. Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2020-07-17dmaengine: Add support for repeating transactionsLaurent Pinchart
DMA engines used with displays perform 2D interleaved transfers to read framebuffers from memory and feed the data to the display engine. As the same framebuffer can be displayed for multiple frames, the DMA transactions need to be repeated until a new framebuffer replaces the current one. This feature is implemented natively by some DMA engines that have the ability to repeat transactions and switch to a new transaction at the end of a transfer without any race condition or frame loss. This patch implements support for this feature in the DMA engine API. A new DMA_PREP_REPEAT transaction flag allows DMA clients to instruct the DMA channel to repeat the transaction automatically until one or more new transactions are issued on the channel (or until all active DMA transfers are explicitly terminated with the dmaengine_terminate_*() functions). A new DMA_REPEAT transaction type is also added for DMA engine drivers to report their support of the DMA_PREP_REPEAT flag. A new DMA_PREP_LOAD_EOT transaction flag is also introduced (with a corresponding DMA_LOAD_EOT capability bit), as requested during the review of v4. The flag instructs the DMA channel that the transaction being queued should replace the active repeated transaction when the latter terminates (at End Of Transaction). Not setting the flag will result in the active repeated transaction to continue being repeated, and the new transaction being silently ignored. The DMA_PREP_REPEAT flag is currently supported for interleaved transactions only. Its usage can easily be extended to cover more transaction types simply by adding an appropriate check in the corresponding dmaengine_prep_*() function. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Link: https://lore.kernel.org/r/20200717013337.24122-3-laurent.pinchart@ideasonboard.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-07-16Merge tag 'drm-fixes-2020-07-17-1' of git://anongit.freedesktop.org/drm/drm ↵Linus Torvalds
into master Pull drm fixes from Dave Airlie: "Weekly fixes pull, big bigger than I'd normally like, but they are fairly scattered and small individually. The vmwgfx one is a black screen regression, otherwise the largest is an MST encoder fix for amdgpu which results in a WARN in some cases, and a scattering of i915 fixes. I'm tracking two regressions at the moment that hopefully we get nailed down this week for rc7. dma-buf: - sleeping atomic fix amdgpu: - Fix a race condition with KIQ - Preemption fix - Fix handling of fake MST encoders - OLED panel fix - Handle allocation failure in stream construction - Renoir SMC fix - SDMA 5.x fix i915: - FBC w/a stride fix - Fix use-after-free fix on module reload - Ignore irq enabling on the virtual engines to fix device sleep - Use GTT when saving/restoring engine GPR - Fix selftest sort function vmwgfx: - black screen fix aspeed: - fbcon init warn fix" * tag 'drm-fixes-2020-07-17-1' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu/sdma5: fix wptr overwritten in ->get_wptr() drm/amdgpu/powerplay: Modify SMC message name for setting power profile mode drm/amd/display: handle failed allocation during stream construction drm/amd/display: OLED panel backlight adjust not work with external display connected drm/amdgpu/display: create fake mst encoders ahead of time (v4) drm/amdgpu: fix preemption unit test drm/amdgpu/gfx10: fix race condition for kiq drm/i915: Recalculate FBC w/a stride when needed drm/i915: Move cec_notifier to intel_hdmi_connector_unregister, v2. drm/i915/gt: Only swap to a random sibling once upon creation drm/i915/gt: Ignore irq enabling on the virtual engines drm/i915/perf: Use GTT when saving/restoring engine GPR drm/i915/selftests: Fix compare functions provided for sorting drm/vmwgfx: fix update of display surface when resolution changes dmabuf: use spinlock to access dmabuf->name drm/aspeed: Call drm_fbdev_generic_setup after drm_dev_register
2020-07-16Merge tag 'driver-core-5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into master Pull driver core fixes from Greg KH: "Here are 3 driver core fixes for 5.8-rc6. They resolve some issues found with the deferred probe code for some types of devices on some embedded systems. They have been tested a bunch and have been in linux-next for a while with no reported issues" * tag 'driver-core-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Avoid deferred probe due to fw_devlink_pause/resume() driver core: Rename dev_links_info.defer_sync to defer_hook driver core: Don't do deferred probe in parallel with kernel_init thread
2020-07-16Merge tag 'tty-5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty into master Pull tty/serial driver fixes from Greg KH: :Here are some small tty and serial driver fixes for 5.8-rc6. The largest set of patches in here is a revert of the sysrq changes that went into 5.8-rc1 but turned out to cause a noticable overhead and cpu usage. Other than that, there's a few small serial driver fixes to resolve reported issues, and finally resolving the spinlock init problem on many serial driver consoles. All of these have been in linux-next for a while with no reported issues" * tag 'tty-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: core: Initialise spin lock before use in uart_configure_port() serial: mxs-auart: add missed iounmap() in probe failure and remove serial: sh-sci: Initialize spinlock for uart console Revert "tty: xilinx_uartps: Fix missing id assignment to the console" serial: core: drop redundant sysrq checks serial: core: fix sysrq overhead regression Revert "serial: core: Refactor uart_unlock_and_check_sysrq()" tty/serial: fix serial_core.c kernel-doc warnings tty: serial: cpm_uart: Fix behaviour for non existing GPIOs
2020-07-16Merge tag 'drm-misc-fixes-2020-07-15' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes * aspeed: setup fbdev console after registering device; avoids warning and stacktrace in dmesg log * dmabuf: protect dmabuf->name with a spinlock; avoids sleeping in atomic context Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20200715171756.GA18606@linux-uq9g
2020-07-14dma-direct: provide function to check physical memory area validityNicolas Saenz Julienne
dma_coherent_ok() checks if a physical memory area fits a device's DMA constraints. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-14fuse: reject options on reconfigure via fsconfig(2)Miklos Szeredi
Previous patch changed handling of remount/reconfigure to ignore all options, including those that are unknown to the fuse kernel fs. This was done for backward compatibility, but this likely only affects the old mount(2) API. The new fsconfig(2) based reconfiguration could possibly be improved. This would make the new API less of a drop in replacement for the old, OTOH this is a good chance to get rid of some weirdnesses in the old API. Several other behaviors might make sense: 1) unknown options are rejected, known options are ignored 2) unknown options are rejected, known options are rejected if the value is changed, allowed otherwise 3) all options are rejected Prior to the backward compatibility fix to ignore all options all known options were accepted (1), even if they change the value of a mount parameter; fuse_reconfigure() does not look at the config values set by fuse_parse_param(). To fix that we'd need to verify that the value provided is the same as set in the initial configuration (2). The major drawback is that this is much more complex than just rejecting all attempts at changing options (3); i.e. all options signify initial configuration values and don't make sense on reconfigure. This patch opts for (3) with the rationale that no mount options are reconfigurable in fuse. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "A few quirks for the Elan touchpad driver, another Thinkpad is being switched over from PS/2 to native RMI4 interface, and we gave a brand new SW_MACHINE_COVER switch definition" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elan_i2c - add more hardware ID for Lenovo laptops Input: i8042 - add Lenovo XiaoXin Air 12 to i8042 nomux list Revert "Input: elants_i2c - report resolution information for touch major" Input: elan_i2c - only increment wakeup count on touch Input: synaptics - enable InterTouch for ThinkPad X1E 1st gen ARM: dts: n900: remove mmc1 card detect gpio Input: add `SW_MACHINE_COVER`
2020-07-11Merge tag 'riscv-for-linus-5.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "I have a few KGDB-related fixes. They're mostly fixes for build warnings, but there's also: - Support for the qSupported and qXfer packets, which are necessary to pass around GDB XML information which we need for the RISC-V GDB port to fully function. - Users can now select STRICT_KERNEL_RWX instead of forcing it on" * tag 'riscv-for-linus-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Avoid kgdb.h including gdb_xml.h to solve unused-const-variable warning kgdb: Move the extern declaration kgdb_has_hit_break() to generic kgdb.h riscv: Fix "no previous prototype" compile warning in kgdb.c file riscv: enable the Kconfig prompt of STRICT_KERNEL_RWX kgdb: enable arch to support XML packet.
2020-07-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking BPF programs, from Maciej Żenczykowski. 2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu Mariappan. 3) Slay memory leak in nl80211 bss color attribute parsing code, from Luca Coelho. 4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin. 5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric Dumazet. 6) xsk code dips too deeply into DMA mapping implementation internals. Add dma_need_sync and use it. From Christoph Hellwig 7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko. 8) Check for disallowed attributes when loading flow dissector BPF programs. From Lorenz Bauer. 9) Correct packet injection to L3 tunnel devices via AF_PACKET, from Jason A. Donenfeld. 10) Don't advertise checksum offload on ipa devices that don't support it. From Alex Elder. 11) Resolve several issues in TCP MD5 signature support. Missing memory barriers, bogus options emitted when using syncookies, and failure to allow md5 key changes in established states. All from Eric Dumazet. 12) Fix interface leak in hsr code, from Taehee Yoo. 13) VF reset fixes in hns3 driver, from Huazhong Tan. 14) Make loopback work again with ipv6 anycast, from David Ahern. 15) Fix TX starvation under high load in fec driver, from Tobias Waldekranz. 16) MLD2 payload lengths not checked properly in bridge multicast code, from Linus Lüssing. 17) Packet scheduler code that wants to find the inner protocol currently only works for one level of VLAN encapsulation. Allow Q-in-Q situations to work properly here, from Toke Høiland-Jørgensen. 18) Fix route leak in l2tp, from Xin Long. 19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport support and various protocols. From Martin KaFai Lau. 20) Fix socket cgroup v2 reference counting in some situations, from Cong Wang. 21) Cure memory leak in mlx5 connection tracking offload support, from Eli Britstein. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits) mlxsw: pci: Fix use-after-free in case of failed devlink reload mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON() net: macb: fix call to pm_runtime in the suspend/resume functions net: macb: fix macb_suspend() by removing call to netif_carrier_off() net: macb: fix macb_get/set_wol() when moving to phylink net: macb: mark device wake capable when "magic-packet" property present net: macb: fix wakeup test in runtime suspend/resume routines bnxt_en: fix NULL dereference in case SR-IOV configuration fails libbpf: Fix libbpf hashmap on (I)LP32 architectures net/mlx5e: CT: Fix memory leak in cleanup net/mlx5e: Fix port buffers cell size value net/mlx5e: Fix 50G per lane indication net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash net/mlx5e: Fix VXLAN configuration restore after function reload net/mlx5e: Fix usage of rcu-protected pointer net/mxl5e: Verify that rpriv is not NULL net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode net/mlx5: Fix eeprom support for SFP module cgroup: Fix sock_cgroup_data on big-endian. selftests: bpf: Fix detach from sockmap tests ...
2020-07-10Merge tag 'mlx5-fixes-2020-07-02' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2020-07-02 This series introduces some fixes to mlx5 driver. V1->v2: - Drop "ip -s" patch and mirred device hold reference patch. - Will revise them in a later submission. Please pull and let me know if there is any problem. For -stable v5.2 ('net/mlx5: Fix eeprom support for SFP module') For -stable v5.4 ('net/mlx5e: Fix 50G per lane indication') For -stable v5.5 ('net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash') ('net/mlx5e: Fix VXLAN configuration restore after function reload') For -stable v5.7 ('net/mlx5e: CT: Fix memory leak in cleanup') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10Merge tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/miscLinus Torvalds
Pull in-kernel read and write op cleanups from Christoph Hellwig: "Cleanup in-kernel read and write operations Reshuffle the (__)kernel_read and (__)kernel_write helpers, and ensure all users of in-kernel file I/O use them if they don't use iov_iter based methods already. The new WARN_ONs in combination with syzcaller already found a missing input validation in 9p. The fix should be on your way through the maintainer ASAP". [ This is prep-work for the real changes coming 5.9 ] * tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc: fs: remove __vfs_read fs: implement kernel_read using __kernel_read integrity/ima: switch to using __kernel_read fs: add a __kernel_read helper fs: remove __vfs_write fs: implement kernel_write using __kernel_write fs: check FMODE_WRITE in __kernel_write fs: unexport __kernel_write bpfilter: switch to kernel_write autofs: switch to kernel_write cachefiles: switch to kernel_write
2020-07-10Merge tag 'dma-mapping-5.8-5' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fixes from Christoph Hellwig: - add a warning when the atomic pool is depleted (David Rientjes) - protect the parameters of the new scatterlist helper macros (Marek Szyprowski ) * tag 'dma-mapping-5.8-5' of git://git.infradead.org/users/hch/dma-mapping: scatterlist: protect parameters of the sg_table related macros dma-mapping: warn when coherent pool is depleted
2020-07-10Merge tag 'gfs2-v5.8-rc4.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Fix gfs2 readahead deadlocks by adding a IOCB_NOIO flag that allows gfs2 to use the generic fiel read iterator functions without having to worry about being called back while holding locks". * tag 'gfs2-v5.8-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Rework read and page fault locking fs: Add IOCB_NOIO flag for generic_file_read_iter
2020-07-10fbdev/fb.h: Use struct_size() helper in kzalloc()Gustavo A. R. Silva
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200617175647.GA26370@embeddedor
2020-07-10driver core: Avoid deferred probe due to fw_devlink_pause/resume()Saravana Kannan
With the earlier patch in this series, all devices that deferred probe due to fw_devlink_pause() will have their probes delayed till the deferred probe thread is kicked off during late_initcall. This will also affect all their consumers. This delayed probing in unnecessary. So this patch just keeps track of the devices that had their probe deferred due to fw_devlink_pause() and attempts to probe them once during fw_devlink_resume(). Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing") Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20200701194259.3337652-4-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-10driver core: Rename dev_links_info.defer_sync to defer_hookSaravana Kannan
The defer_sync field is used as a hook to add the device to the deferred_sync list. Rename it so that it's more meaningful for the next patch that'll also use this field as a hook to a deferred_fw_devlink list. Signed-off-by: Saravana Kannan <saravanak@google.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20200701194259.3337652-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-10dmabuf: use spinlock to access dmabuf->nameCharan Teja Kalla
There exists a sleep-while-atomic bug while accessing the dmabuf->name under mutex in the dmabuffs_dname(). This is caused from the SELinux permissions checks on a process where it tries to validate the inherited files from fork() by traversing them through iterate_fd() (which traverse files under spin_lock) and call match_file(security/selinux/hooks.c) where the permission checks happen. This audit information is logged using dump_common_audit_data() where it calls d_path() to get the file path name. If the file check happen on the dmabuf's fd, then it ends up in ->dmabuffs_dname() and use mutex to access dmabuf->name. The flow will be like below: flush_unauthorized_files() iterate_fd() spin_lock() --> Start of the atomic section. match_file() file_has_perm() avc_has_perm() avc_audit() slow_avc_audit() common_lsm_audit() dump_common_audit_data() audit_log_d_path() d_path() dmabuffs_dname() mutex_lock()--> Sleep while atomic. Call trace captured (on 4.19 kernels) is below: ___might_sleep+0x204/0x208 __might_sleep+0x50/0x88 __mutex_lock_common+0x5c/0x1068 __mutex_lock_common+0x5c/0x1068 mutex_lock_nested+0x40/0x50 dmabuffs_dname+0xa0/0x170 d_path+0x84/0x290 audit_log_d_path+0x74/0x130 common_lsm_audit+0x334/0x6e8 slow_avc_audit+0xb8/0xf8 avc_has_perm+0x154/0x218 file_has_perm+0x70/0x180 match_file+0x60/0x78 iterate_fd+0x128/0x168 selinux_bprm_committing_creds+0x178/0x248 security_bprm_committing_creds+0x30/0x48 install_exec_creds+0x1c/0x68 load_elf_binary+0x3a4/0x14e0 search_binary_handler+0xb0/0x1e0 So, use spinlock to access dmabuf->name to avoid sleep-while-atomic. Cc: <stable@vger.kernel.org> [5.3+] Signed-off-by: Charan Teja Kalla <charante@codeaurora.org> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Acked-by: Christian König <christian.koenig@amd.com> [sumits: added comment to spinlock_t definition to avoid warning] Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/a83e7f0d-4e54-9848-4b58-e1acdbe06735@codeaurora.org
2020-07-09kgdb: Move the extern declaration kgdb_has_hit_break() to generic kgdb.hVincent Chen
Currently, only riscv kgdb.c uses the kgdb_has_hit_break() to identify the kgdb breakpoint. It causes other architectures will encounter the "no previous prototype" warnings if the compile option has W=1. Moving the declaration of extern kgdb_has_hit_break() from risc-v kgdb.h to generic kgdb.h to avoid generating these warnings. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-07-09kgdb: enable arch to support XML packet.Vincent Chen
The XML packet could be supported by required architecture if the architecture defines CONFIG_HAVE_ARCH_KGDB_QXFER_PKT and implement its own kgdb_arch_handle_qxfer_pkt(). Except for the kgdb_arch_handle_qxfer_pkt(), the architecture also needs to record the feature supported by gdb stub into the kgdb_arch_gdb_stub_feature, and these features will be reported to host gdb when gdb stub receives the qSupported packet. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-07-09net/mlx5e: Fix port buffers cell size valueEran Ben Elisha
Device unit for port buffers size, xoff_threshold and xon_threshold is cells. Fix a bug in driver where cell unit size was hard-coded to 128 bytes. This hard-coded value is buggy, as it is wrong for some hardware versions. Driver to read cell size from SBCAM register and translate bytes to cell units accordingly. In order to fix the bug, this patch exposes SBCAM (Shared buffer capabilities mask) layout and defines. If SBCAM.cap_cell_size is valid, use it for all bytes to cells calculations. If not valid, fallback to 128. Cell size do not change on the fly per device. Instead of issuing SBCAM access reg command every time such translation is needed, cache it in mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz as a param to every function that needs bytes to cells translation. While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to en_dcbnl.c, as it is only used by that file. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-09cgroup: Fix sock_cgroup_data on big-endian.Cong Wang
In order for no_refcnt and is_data to be the lowest order two bits in the 'val' we have to pad out the bitfield of the u8. Fixes: ad0f75e5f57c ("cgroup: fix cgroup_sk_alloc() for sk_clone_lock()") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09Merge tag 'kallsyms_show_value-v5.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kallsyms fix from Kees Cook: "Refactor kallsyms_show_value() users for correct cred. I'm not delighted by the timing of getting these changes to you, but it does fix a handful of kernel address exposures, and no one has screamed yet at the patches. Several users of kallsyms_show_value() were performing checks not during "open". Refactor everything needed to gain proper checks against file->f_cred for modules, kprobes, and bpf" * tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests: kmod: Add module address visibility test bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok() kprobes: Do not expose probe addresses to non-CAP_SYSLOG module: Do not expose section addresses to non-CAP_SYSLOG module: Refactor section attr into bin attribute kallsyms: Refactor kallsyms_show_value() to take cred
2020-07-08Input: elan_i2c - add more hardware ID for Lenovo laptopsDave Wang
This adds more hardware IDs for Elan touchpads found in various Lenovo laptops. Signed-off-by: Dave Wang <dave.wang@emc.com.tw> Link: https://lore.kernel.org/r/000201d5a8bd$9fead3f0$dfc07bd0$@emc.com.tw Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-07-08bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()Kees Cook
When evaluating access control over kallsyms visibility, credentials at open() time need to be used, not the "current" creds (though in BPF's case, this has likely always been the same). Plumb access to associated file->f_cred down through bpf_dump_raw_ok() and its callers now that kallsysm_show_value() has been refactored to take struct cred. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: bpf@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump") Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08kallsyms: Refactor kallsyms_show_value() to take credKees Cook
In order to perform future tests against the cred saved during open(), switch kallsyms_show_value() to operate on a cred, and have all current callers pass current_cred(). This makes it very obvious where callers are checking the wrong credential in their "read" contexts. These will be fixed in the coming patches. Additionally switch return value to bool, since it is always used as a direct permission check, not a 0-on-success, negative-on-error style function return. Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08Raise gcc version requirement to 4.9Linus Torvalds
I realize that we fairly recently raised it to 4.8, but the fact is, 4.9 is a much better minimum version to target. We have a number of workarounds for actual bugs in pre-4.9 gcc versions (including things like internal compiler errors on ARM), but we also have some syntactic workarounds for lacking features. In particular, raising the minimum to 4.9 means that we can now just assume _Generic() exists, which is likely the much better replacement for a lot of very convoluted built-time magic with conditionals on sizeof and/or __builtin_choose_expr() with same_type() etc. Using _Generic also means that you will need to have a very recent version of 'sparse', but thats easy to build yourself, and much less of a hassle than some old gcc version can be. The latest (in a long string) of reasons for minimum compiler version upgrades was commit 5435f73d5c4a ("efi/x86: Fix build with gcc 4"). Ard points out that RHEL 7 uses gcc-4.8, but the people who stay back on old RHEL versions persumably also don't build their own kernels anyway. And maybe they should cross-built or just have a little side affair with a newer compiler? Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-08sched: Fix loadavg accounting racePeter Zijlstra
The recent commit: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") moved these lines in ttwu(): p->sched_contributes_to_load = !!task_contributes_to_load(p); p->state = TASK_WAKING; up before: smp_cond_load_acquire(&p->on_cpu, !VAL); into the 'p->on_rq == 0' block, with the thinking that once we hit schedule() the current task cannot change it's ->state anymore. And while this is true, it is both incorrect and flawed. It is incorrect in that we need at least an ACQUIRE on 'p->on_rq == 0' to avoid weak hardware from re-ordering things for us. This can fairly easily be achieved by relying on the control-dependency already in place. The second problem, which makes the flaw in the original argument, is that while schedule() will not change prev->state, it will read it a number of times (arguably too many times since it's marked volatile). The previous condition 'p->on_cpu == 0' was sufficient because that indicates schedule() has completed, and will no longer read prev->state. So now the trick is to make this same true for the (much) earlier 'prev->on_rq == 0' case. Furthermore, in order to make the ordering stick, the 'prev->on_rq = 0' assignment needs to he a RELEASE, but adding additional ordering to schedule() is an unwelcome proposition at the best of times, doubly so for mere accounting. Luckily we can push the prev->state load up before rq->lock, with the only caveat that we then have to re-read the state after. However, we know that if it changed, we no longer have to worry about the blocking path. This gives us the required ordering, if we block, we did the prev->state load before an (effective) smp_mb() and the p->on_rq store needs not change. With this we end up with the effective ordering: LOAD p->state LOAD-ACQUIRE p->on_rq == 0 MB STORE p->on_rq, 0 STORE p->state, TASK_WAKING which ensures the TASK_WAKING store happens after the prev->state load, and all is well again. Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") Reported-by: Dave Jones <davej@codemonkey.org.uk> Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Dave Jones <davej@codemonkey.org.uk> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: https://lkml.kernel.org/r/20200707102957.GN117543@hirez.programming.kicks-ass.net
2020-07-08fs: remove __vfs_readChristoph Hellwig
Fold it into the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08fs: add a __kernel_read helperChristoph Hellwig
This is the counterpart to __kernel_write, and skip the rw_verify_area call compared to kernel_read. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-07vlan: consolidate VLAN parsing code and limit max parsing depthToke Høiland-Jørgensen
Toshiaki pointed out that we now have two very similar functions to extract the L3 protocol number in the presence of VLAN tags. And Daniel pointed out that the unbounded parsing loop makes it possible for maliciously crafted packets to loop through potentially hundreds of tags. Fix both of these issues by consolidating the two parsing functions and limiting the VLAN tag parsing to a max depth of 8 tags. As part of this, switch over __vlan_get_protocol() to use skb_header_pointer() instead of pskb_may_pull(), to avoid the possible side effects of the latter and keep the skb pointer 'const' through all the parsing functions. v2: - Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT) Reported-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Reported-by: Daniel Borkmann <daniel@iogearbox.net> Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in the presence of VLANs") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-07fs: Add IOCB_NOIO flag for generic_file_read_iterAndreas Gruenbacher
Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it shouldn't trigger any filesystem I/O for the actual request or for readahead. This allows to do tentative reads out of the page cache as some filesystems allow, and to take the appropriate locks and retry the reads only if the requested pages are not cached. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-07-07cgroup: fix cgroup_sk_alloc() for sk_clone_lock()Cong Wang
When we clone a socket in sk_clone_lock(), its sk_cgrp_data is copied, so the cgroup refcnt must be taken too. And, unlike the sk_alloc() path, sock_update_netprioidx() is not called here. Therefore, it is safe and necessary to grab the cgroup refcnt even when cgroup_sk_alloc is disabled. sk_clone_lock() is in BH context anyway, the in_interrupt() would terminate this function if called there. And for sk_alloc() skcd->val is always zero. So it's safe to factor out the code to make it more readable. The global variable 'cgroup_sk_alloc_disabled' is used to determine whether to take these reference counts. It is impossible to make the reference counting correct unless we save this bit of information in skcd->val. So, add a new bit there to record whether the socket has already taken the reference counts. This obviously relies on kmalloc() to align cgroup pointers to at least 4 bytes, ARCH_KMALLOC_MINALIGN is certainly larger than that. This bug seems to be introduced since the beginning, commit d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets") tried to fix it but not compeletely. It seems not easy to trigger until the recent commit 090e28b229af ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged. Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Reported-by: Cameron Berkenpas <cam@neo-zeon.de> Reported-by: Peter Geis <pgwipeout@gmail.com> Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reported-by: Daniël Sonck <dsonck92@gmail.com> Reported-by: Zhang Qiang <qiang.zhang@windriver.com> Tested-by: Cameron Berkenpas <cam@neo-zeon.de> Tested-by: Peter Geis <pgwipeout@gmail.com> Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Zefan Li <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-06scatterlist: protect parameters of the sg_table related macrosMarek Szyprowski
Add brackets to protect parameters of the recently added sg_table related macros from side-effects. Fixes: 709d6d73c756 ("scatterlist: add generic wrappers for iterating over sgtable objects") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-03sched: consistently handle layer3 header accesses in the presence of VLANsToke Høiland-Jørgensen
There are a couple of places in net/sched/ that check skb->protocol and act on the value there. However, in the presence of VLAN tags, the value stored in skb->protocol can be inconsistent based on whether VLAN acceleration is enabled. The commit quoted in the Fixes tag below fixed the users of skb->protocol to use a helper that will always see the VLAN ethertype. However, most of the callers don't actually handle the VLAN ethertype, but expect to find the IP header type in the protocol field. This means that things like changing the ECN field, or parsing diffserv values, stops working if there's a VLAN tag, or if there are multiple nested VLAN tags (QinQ). To fix this, change the helper to take an argument that indicates whether the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we make sure to skip all of them, so behaviour is consistent even in QinQ mode. To make the helper usable from the ECN code, move it to if_vlan.h instead of pkt_sched.h. v3: - Remove empty lines - Move vlan variable definitions inside loop in skb_protocol() - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and bpf_skb_ecn_set_ce() v2: - Use eth_type_vlan() helper in skb_protocol() - Also fix code that reads skb->protocol directly - Change a couple of 'if/else if' statements to switch constructs to avoid calling the helper twice Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-03Merge tag 'pci-v5.8-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Fix a pcie_find_root_port() simplification that broke power management because it didn't handle the edge case of finding the Root Port of a Root Port itself (Mika Westerberg)"" * tag 'pci-v5.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Make pcie_find_root_port() work for Root Ports
2020-07-02Merge tag 'block-5.8-2020-07-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Use kvfree_sensitive() for the block keyslot free (Eric) - Sync blk-mq debugfs flags (Hou) - Memory leak fix in virtio-blk error path (Hou) * tag 'block-5.8-2020-07-01' of git://git.kernel.dk/linux-block: virtio-blk: free vblk-vqs in error path of virtblk_probe() block/keyslot-manager: use kvfree_sensitive() blk-mq-debugfs: update blk_queue_flag_name[] accordingly for new flags