Commit Graph

1324854 Commits

Author SHA1 Message Date
David Hildenbrand
a9403425b3 virtio-mem: mark device ready before registering callbacks in kdump mode
After the callbacks are registered we may immediately get a callback. So
mark the device ready before registering the callbacks.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-10-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-27 09:39:25 -05:00
David Hildenbrand
7ad4d1f6e6 fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
s390 allocates+prepares the elfcore hdr in the dump (2nd) kernel, not in
the crashed kernel.

RAM provided by memory devices such as virtio-mem can only be detected
using the device driver; when vmcore_init() is called, these device
drivers are usually not loaded yet, or the devices did not get probed
yet. Consequently, on s390 these RAM ranges will not be included in
the crash dump, which makes the dump partially corrupt and is
unfortunate.

Instead of deferring the vmcore_init() call, to an (unclear?) later point,
let's reuse the vmcore_cb infrastructure to obtain device RAM ranges as
the device drivers probe the device and get access to this information.

Then, we'll add these ranges to the vmcore, adding more PT_LOAD
entries and updating the offsets+vmcore size.

Use a separate Kconfig option to be set by an architecture to include this
code only if the arch really needs it. Further, we'll make the config
depend on the relevant drivers (i.e., virtio_mem) once they implement
support (next). The alternative of having a PROVIDE_PROC_VMCORE_DEVICE_RAM
config option was dropped for now for simplicity.

The current target use case is s390, which only creates an elf64
elfcore, so focusing on elf64 is sufficient.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-9-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-27 09:39:19 -05:00
David Hildenbrand
e29e9acae0 fs/proc/vmcore: factor out freeing a list of vmcore ranges
Let's factor it out into include/linux/crash_dump.h, from where we can
use it also outside of vmcore.c later.

Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-8-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-27 09:39:16 -05:00
David Hildenbrand
e017b1f4aa fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list
Let's factor it out into include/linux/crash_dump.h, from where we can
use it also outside of vmcore.c later.

Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-7-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-27 09:39:11 -05:00
David Hildenbrand
819403c893 fs/proc/vmcore: move vmcore definitions out of kcore.h
These vmcore defines are not related to /proc/kcore, move them out.

We'll move "struct vmcoredd_node" to vmcore.c, because it is only used
internally. While "struct vmcore" is only used internally for now,
we're planning on using it from inline functions in crash_dump.h next,
so move it to crash_dump.h.

While at it, rename "struct vmcore" to "struct vmcore_range", which is a
more suitable name and will make the usage of it outside of vmcore.c
clearer.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-6-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-27 09:39:07 -05:00
David Hildenbrand
8e386957cc fs/proc/vmcore: prefix all pr_* with "vmcore:"
Let's use "vmcore: " as a prefix, converting the single "Kdump:
vmcore not initialized" one to effectively be "vmcore: not initialized".

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-5-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-27 09:39:02 -05:00
David Hildenbrand
0f3b1c40c6 fs/proc/vmcore: disallow vmcore modifications while the vmcore is open
The vmcoredd_update_size() call and its effects (size/offset changes) are
currently completely unsynchronized, and will cause trouble when
performed concurrently, or when done while someone is already reading the
vmcore.

Let's protect all vmcore modifications by the vmcore_mutex, disallow vmcore
modifications while the vmcore is open, and warn on vmcore
modifications after the vmcore was already opened once: modifications
while the vmcore is open are unsafe, and modifications after the vmcore
was opened indicates trouble. Properly synchronize against concurrent
opening of the vmcore.

No need to grab the mutex during mmap()/read(): after we opened the
vmcore, modifications are impossible.

It's worth noting that modifications after the vmcore was opened are
completely unexpected, so failing if open, and warning if already opened
(+closed again) is good enough.

This change not only handles concurrent adding of device dumps +
concurrent reading of the vmcore properly, it also prepares for other
mechanisms that will modify the vmcore.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-4-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-27 09:38:57 -05:00
David Hildenbrand
2083dfe45e fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
Now that we have a mutex that synchronizes against opening of the vmcore,
let's use that one to replace vmcoredd_mutex: there is no need to have
two separate ones.

This is a preparation for properly preventing vmcore modifications
after the vmcore was opened.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-3-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-27 09:38:52 -05:00
David Hildenbrand
cdbc69716f fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
We want to protect vmcore modifications from concurrent opening of
the vmcore, and also serialize vmcore modification.

(a) We can currently modify the vmcore after it was opened. This can happen
    if a vmcoredd is added after the vmcore module was initialized and
    already opened by user space. We want to fix that and prepare for
    new code wanting to serialize against concurrent opening.

(b) To handle it cleanly we need to protect the modifications against
    concurrent opening. As the modifications end up allocating memory and
    can sleep, we cannot rely on the spinlock.

Let's convert the spinlock into a mutex to prepare for further changes.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-2-david@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-27 09:38:34 -05:00
Yuxue Liu
33bb2d1606 vdpa/vp_vdpa: implement kick_vq_with_data callback
Implement the kick_vq_with_data vDPA callback.
On kick, we pass the next available data to the hardware by writing it in
the kick offset.

Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com>
Message-Id: <20241203023743.1757-1-yuxue.liu@jaguarmicro.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2025-01-08 07:29:05 -05:00
zhang jiao
302b49daf0 virtio_balloon: Use outer variable 'page'
There is no need to define a local variable 'page',
just use outer variable 'page'.

Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com>
Message-Id: <20241120054920.35291-1-zhangjiao2@cmss.chinamobile.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
2025-01-08 07:29:05 -05:00
Yongji Xie
b989531e0b vduse: relicense under GPL-2.0 OR BSD-3-Clause
Dual-license the vduse kernel header file to dual
GPL-2.0 OR BSD-3-Clause license to make it possible
to ship it with DPDK (under BSD-3-Clause) for older
distros.

Signed-off-by: Yongji Xie <xieyongji@bytedance.com>
Message-Id: <20241119074238.38299-1-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-08 06:37:13 -05:00
Linus Torvalds
9d89551994 Linux 6.13-rc6 v6.13-rc6 2025-01-05 14:13:40 -08:00
Linus Torvalds
9244696b34 Merge tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - Fix escaping of '$' in scripts/mksysmap

 - Fix a modpost crash observed with the latest binutils

 - Fix 'provides' in the linux-api-headers pacman package

* tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: pacman-pkg: provide versioned linux-api-headers package
  modpost: work around unaligned data access error
  modpost: refactor do_vmbus_entry()
  modpost: fix the missed iteration for the max bit in do_input()
  scripts/mksysmap: Fix escape chars '$'
2025-01-05 10:52:47 -08:00
Linus Torvalds
5635d8bad2 Merge tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
 "25 hotfixes.  16 are cc:stable.  18 are MM and 7 are non-MM.

  The usual bunch of singletons and two doubletons - please see the
  relevant changelogs for details"

* tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits)
  MAINTAINERS: change Arınç _NAL's name and email address
  scripts/sorttable: fix orc_sort_cmp() to maintain symmetry and transitivity
  mm/util: make memdup_user_nul() similar to memdup_user()
  mm, madvise: fix potential workingset node list_lru leaks
  mm/damon/core: fix ignored quota goals and filters of newly committed schemes
  mm/damon/core: fix new damon_target objects leaks on damon_commit_targets()
  mm/list_lru: fix false warning of negative counter
  vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  mm: shmem: fix the update of 'shmem_falloc->nr_unswapped'
  mm: shmem: fix incorrect index alignment for within_size policy
  percpu: remove intermediate variable in PERCPU_PTR()
  mm: zswap: fix race between [de]compression and CPU hotunplug
  ocfs2: fix slab-use-after-free due to dangling pointer dqi_priv
  fs/proc/task_mmu: fix pagemap flags with PMD THP entries on 32bit
  kcov: mark in_softirq_really() as __always_inline
  docs: mm: fix the incorrect 'FileHugeMapped' field
  mailmap: modify the entry for Mathieu Othacehe
  mm/kmemleak: fix sleeping function called from invalid context at print message
  mm: hugetlb: independent PMD page table shared count
  maple_tree: reload mas before the second call for mas_empty_area
  ...
2025-01-05 10:37:45 -08:00
Linus Torvalds
7a5b6fc8bd Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
 "A randconfig build fix and a performance fix:

   - Fix the CONFIG_RESET_CONTROLLER=n path signature of
     clk_imx8mp_audiomix_reset_controller_register() to appease
     randconfig

   - Speed up the sdhci clk on TH1520 by a factor of 4 by adding
     a fixed factor clk"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: clk-imx8mp-audiomix: fix function signature
  clk: thead: Fix TH1520 emmc and shdci clock rate
2025-01-05 10:28:34 -08:00
Thomas Weißschuh
385443057f kbuild: pacman-pkg: provide versioned linux-api-headers package
The Arch Linux glibc package contains a versioned dependency on
"linux-api-headers". If the linux-api-headers package provided by
pacman-pkg does not specify an explicit version this dependency is not
satisfied.
Fix the dependency by providing an explicit version.

Fixes: c8578539de ("kbuild: add script and target to generate pacman package")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-05 23:19:17 +09:00
Linus Torvalds
ab75170520 Merge tag 'linux-watchdog-6.13-rc6' of git://www.linux-watchdog.org/linux-watchdog
Pull watchdog fix from Wim Van Sebroeck:

 - fix error message during stm32 driver probe

* tag 'linux-watchdog-6.13-rc6' of git://www.linux-watchdog.org/linux-watchdog:
  watchdog: stm32_iwdg: fix error message during driver probe
2025-01-04 10:59:10 -08:00
Linus Torvalds
63676eefb7 Merge tag 'sched_ext-for-6.13-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:

 - Fix a bug where bpf_iter_scx_dsq_new() was not initializing the
   iterator's flags and could inadvertently enable e.g. reverse
   iteration

 - Fix a bug where scx_ops_bypass() could call irq_restore twice

 - Add Andrea and Changwoo as maintainers for better review coverage

 - selftests and tools/sched_ext build and other fixes

* tag 'sched_ext-for-6.13-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Fix dsq_local_on selftest
  sched_ext: initialize kit->cursor.flags
  sched_ext: Fix invalid irq restore in scx_ops_bypass()
  MAINTAINERS: add me as reviewer for sched_ext
  MAINTAINERS: add self as reviewer for sched_ext
  scx: Fix maximal BPF selftest prog
  sched_ext: fix application of sizeof to pointer
  selftests/sched_ext: fix build after renames in sched_ext API
  sched_ext: Add __weak to fix the build errors
2025-01-03 15:09:12 -08:00
Linus Torvalds
f9aa1fb9f8 Merge tag 'wq-for-6.13-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:

 - Suppress a corner case spurious flush dependency warning

 - Two trivial changes

* tag 'wq-for-6.13-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: add printf attribute to __alloc_workqueue()
  workqueue: Do not warn when cancelling WQ_MEM_RECLAIM work from !WQ_MEM_RECLAIM worker
  rust: add safety comment in workqueue traits
2025-01-03 15:03:56 -08:00
Linus Torvalds
2ae3aab557 Merge tag 'block-6.13-20250103' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
 "Collection of fixes for block. Particularly the target name overflow
  has been a bit annoying, as it results in overwriting random memory
  and hence shows up as triggering various other bugs.

   - NVMe pull request via Keith:
      - Fix device specific quirk for PRP list alignment (Robert)
      - Fix target name overflow (Leo)
      - Fix target write granularity (Luis)
      - Fix target sleeping in atomic context (Nilay)
      - Remove unnecessary tcp queue teardown (Chunguang)

   - Simple cdrom typo fix"

* tag 'block-6.13-20250103' of git://git.kernel.dk/linux:
  cdrom: Fix typo, 'devicen' to 'device'
  nvme-tcp: remove nvme_tcp_destroy_io_queues()
  nvmet-loop: avoid using mutex in IO hotpath
  nvmet: propagate npwg topology
  nvmet: Don't overflow subsysnqn
  nvme-pci: 512 byte aligned dma pool segment quirk
2025-01-03 14:58:57 -08:00
Linus Torvalds
a984e234fc Merge tag 'io_uring-6.13-20250103' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:

 - Fix an issue with the read multishot support and posting of CQEs from
   io-wq context

 - Fix a regression introduced in this cycle, where making the timeout
   lock a raw one uncovered another locking dependency. As a result,
   move the timeout flushing outside of the timeout lock, punting them
   to a local list first

 - Fix use of an uninitialized variable in io_async_msghdr. Doesn't
   really matter functionally, but silences a valid KMSAN complaint that
   it's not always initialized

 - Fix use of incrementally provided buffers for read on non-pollable
   files, where the buffer always gets committed upfront. Unfortunately
   the buffer address isn't resolved first, so the read ends up using
   the updated rather than the current value

* tag 'io_uring-6.13-20250103' of git://git.kernel.dk/linux:
  io_uring/kbuf: use pre-committed buffer address for non-pollable file
  io_uring/net: always initialize kmsg->msg.msg_inq upfront
  io_uring/timeout: flush timeouts outside of the timeout lock
  io_uring/rw: fix downgraded mshot read
2025-01-03 14:45:59 -08:00
Linus Torvalds
aba74e639f Merge tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireles and netfilter.

  Nothing major here. Over the last two weeks we gathered only around
  two-thirds of our normal weekly fix count, but delaying sending these
  until -rc7 seemed like a really bad idea.

  AFAIK we have no bugs under investigation. One or two reverts for
  stuff for which we haven't gotten a proper fix will likely come in the
  next PR.

  Current release - fix to a fix:

   - netfilter: nft_set_hash: unaligned atomic read on struct
     nft_set_ext

   - eth: gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup

  Previous releases - regressions:

   - net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets

   - mptcp:
      - fix sleeping rcvmsg sleeping forever after bad recvbuffer adjust
      - fix TCP options overflow
      - prevent excessive coalescing on receive, fix throughput

   - net: fix memory leak in tcp_conn_request() if map insertion fails

   - wifi: cw1200: fix potential NULL dereference after conversion to
     GPIO descriptors

   - phy: micrel: dynamically control external clock of KSZ PHY, fix
     suspend behavior

  Previous releases - always broken:

   - af_packet: fix VLAN handling with MSG_PEEK

   - net: restrict SO_REUSEPORT to inet sockets

   - netdev-genl: avoid empty messages in NAPI get

   - dsa: microchip: fix set_ageing_time function on KSZ9477 and LAN937X

   - eth:
      - gve: XDP fixes around transmit, queue wakeup etc.
      - ti: icssg-prueth: fix firmware load sequence to prevent time
        jump which breaks timesync related operations

  Misc:

   - netlink: specs: mptcp: add missing attr and improve documentation"

* tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init
  net: ti: icssg-prueth: Fix firmware load sequence.
  mptcp: prevent excessive coalescing on receive
  mptcp: don't always assume copied data in mptcp_cleanup_rbuf()
  mptcp: fix recvbuffer adjust on sleeping rcvmsg
  ila: serialize calls to nf_register_net_hooks()
  af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK
  af_packet: fix vlan_get_tci() vs MSG_PEEK
  net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init()
  net: restrict SO_REUSEPORT to inet sockets
  net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets
  net: sfc: Correct key_len for efx_tc_ct_zone_ht_params
  net: wwan: t7xx: Fix FSM command timeout issue
  sky2: Add device ID 11ab:4373 for Marvell 88E8075
  mptcp: fix TCP options overflow.
  net: mv643xx_eth: fix an OF node reference leak
  gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup
  eth: bcmsysport: fix call balance of priv->clk handling routines
  net: llc: reset skb->transport_header
  netlink: specs: mptcp: fix missing doc
  ...
2025-01-03 14:36:54 -08:00
Linus Torvalds
ee063c23e4 Merge tag 'nios2_update_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux
Pull nios2 fixlet from Dinh Nguyen:

 - Use str_yes_no() helper function

* tag 'nios2_update_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
  nios2: Use str_yes_no() helper in show_cpuinfo()
2025-01-03 14:16:25 -08:00
Linus Torvalds
dea3165f98 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
 "A lot of fixes accumulated over the holiday break:

   - Static tool fixes, value is already proven to be NULL, possible
     integer overflow

   - Many bnxt_re fixes:
      - Crashes due to a mismatch in the maximum SGE list size
      - Don't waste memory for user QPs by creating kernel-only
        structures
      - Fix compatability issues with older HW in some of the new HW
        features recently introduced: RTS->RTS feature, work around 9096
      - Do not allow destroy_qp to fail
      - Validate QP MTU against device limits
      - Add missing validation on madatory QP attributes for RTR->RTS
      - Report port_num in query_qp as required by the spec
      - Fix creation of QPs of the maximum queue size, and in the
        variable mode
      - Allow all QPs to be used on newer HW by limiting a work around
        only to HW it affects
      - Use the correct MSN table size for variable mode QPs
      - Add missing locking in create_qp() accessing the qp_tbl
      - Form WQE buffers correctly when some of the buffers are 0 hop
      - Don't crash on QP destroy if the userspace doesn't setup the
        dip_ctx
      - Add the missing QP flush handler call on the DWQE path to avoid
        hanging on error recovery
      - Consistently use ENXIO for return codes if the devices is
        fatally errored

   - Try again to fix VLAN support on iwarp, previous fix was reverted
     due to breaking other cards

   - Correct error path return code for rdma netlink events

   - Remove the seperate net_device pointer in siw and rxe which
     syzkaller found a way to UAF

   - Fix a UAF of a stack ib_sge in rtrs

   - Fix a regression where old mlx5 devices and FW were wrongly
     activing new device features and failing"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits)
  RDMA/mlx5: Enable multiplane mode only when it is supported
  RDMA/bnxt_re: Fix error recovery sequence
  RDMA/rtrs: Ensure 'ib_sge list' is accessible
  RDMA/rxe: Remove the direct link to net_device
  RDMA/hns: Fix missing flush CQE for DWQE
  RDMA/hns: Fix warning storm caused by invalid input in IO path
  RDMA/hns: Fix accessing invalid dip_ctx during destroying QP
  RDMA/hns: Fix mapping error of zero-hop WQE buffer
  RDMA/bnxt_re: Fix the locking while accessing the QP table
  RDMA/bnxt_re: Fix MSN table size for variable wqe mode
  RDMA/bnxt_re: Add send queue size check for variable wqe
  RDMA/bnxt_re: Disable use of reserved wqes
  RDMA/bnxt_re: Fix max_qp_wrs reported
  RDMA/siw: Remove direct link to net_device
  RDMA/nldev: Set error code in rdma_nl_notify_event
  RDMA/bnxt_re: Fix reporting hw_ver in query_device
  RDMA/bnxt_re: Fix to export port num to ib_query_qp
  RDMA/bnxt_re: Fix setting mandatory attributes for modify_qp
  RDMA/bnxt_re: Add check for path mtu in modify_qp
  RDMA/bnxt_re: Fix the check for 9060 condition
  ...
2025-01-03 11:09:35 -08:00
Linus Torvalds
f274fffbc2 Merge tag 'pinctrl-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:

 - A small Kconfig fixup for the i.MX.

   In principle this could come in from the SoC tree but the bug was
   introduced from the pin control tree so let's fix it from here.

 - Fix a sleep in atomic context in the MCP23xxx GPIO expander by
   disabling the regmap locking and using explicit mutex locks.

* tag 'pinctrl-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: mcp23s08: Fix sleeping in atomic context due to regmap locking
  ARM: imx: Re-introduce the PINCTRL selection
2025-01-03 10:57:57 -08:00
Linus Torvalds
4f5d3da619 Merge tag 'sound-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "The first new year pull request: no surprises, all small fixes,
  including:

   - Follow-up fixes for the new compress-offload API extension

   - A couple of fixes for MIDI 2.0 UMP handling

   - A trivial race fix for OSS sequencer emulation ioctls

   - USB-audio and HD-audio fixes / quirks"

* tag 'sound-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: seq: Check UMP support for midi_version change
  ALSA hda/realtek: Add quirk for Framework F111:000C
  Revert "ALSA: ump: Don't enumeration invalid groups for legacy rawmidi"
  ALSA: seq: oss: Fix races at processing SysEx messages
  ALSA: compress_offload: fix remaining descriptor races in sound/core/compress_offload.c
  ALSA: compress_offload: Drop unneeded no_free_ptr()
  ALSA: hda/tas2781: Ignore SUBSYS_ID not found for tas2563 projects
  ALSA: usb-audio: US16x08: Initialize array before use
2025-01-03 10:54:51 -08:00
Linus Torvalds
92c3bb3d2e Merge tag 'drm-fixes-2025-01-03' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Happy New Year.

  It was fairly quiet for holidays period, certainly nothing that worth
  getting off the couch before I needed to, this is for the past two
  weeks, i915, xe and some adv7511, I expect we will see some amdgpu etc
  happening next week, but otherwise all quiet.

  i915:
   - Fix C10 pll programming sequence [cx0_phy]
   - Fix power gate sequence. [dg1]

  xe:
   - uapi: Revert some devcoredump file format changes breaking a mesa
     debug tool
   - Fixes around waits when moving to system
   - Fix a typo when checking for LMEM provisioning
   - Fix a fault on fd close after unbind
   - A couple of OA fixes squashed for stable backporting

  adv7511:
   - fix UAF
   - drop single lane support
   - audio infoframe fix"

* tag 'drm-fixes-2025-01-03' of https://gitlab.freedesktop.org/drm/kernel:
  xe/oa: Fix query mode of operation for OAR/OAC
  drm/i915/dg1: Fix power gate sequence.
  drm/i915/cx0_phy: Fix C10 pll programming sequence
  drm/xe: Fix fault on fd close after unbind
  drm/xe/pf: Use correct function to check LMEM provisioning
  drm/xe: Wait for migration job before unmapping pages
  drm/xe: Use non-interruptible wait when moving BO to system
  drm/xe: Revert some changes that break a mesa debug tool
  drm: adv7511: Drop dsi single lane support
  dt-bindings: display: adi,adv7533: Drop single lane support
  drm: adv7511: Fix use-after-free in adv7533_attach_dsi()
  drm/bridge: adv7511_audio: Update Audio InfoFrame properly
2025-01-03 10:06:44 -08:00
Linus Torvalds
e30dd219c7 Merge tag 'ftrace-v6.13-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fixes from Steven Rostedt:

 - Add needed READ_ONCE() around access to the fgraph array element

   The updates to the fgraph array can happen when callbacks are
   registered and unregistered. The __ftrace_return_to_handler() can
   handle reading either the old value or the new value. But once it
   reads that value it must stay consistent otherwise the check that
   looks to see if the value is a stub may show false, but if the
   compiler decides to re-read after that check, it can be true which
   can cause the code to crash later on.

 - Make function profiler use the top level ops for filtering again

   When function graph became available for instances, its filter ops
   became independent from the top level set_ftrace_filter. In the
   process the function profiler received its own filter ops as well.
   But the function profiler uses the top level set_ftrace_filter file
   and does not have one of its own. In giving it its own filter ops, it
   lost any user interface it once had. Make it use the top level
   set_ftrace_filter file again. This fixes a regression.

* tag 'ftrace-v6.13-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Fix function profiler's filtering functionality
  fgraph: Add READ_ONCE() when accessing fgraph_array[]
2025-01-03 10:04:43 -08:00
Jens Axboe
ed123c948d io_uring/kbuf: use pre-committed buffer address for non-pollable file
For non-pollable files, buffer ring consumption will commit upfront.
This is fine, but io_ring_buffer_select() will return the address of the
buffer after having committed it. For incrementally consumed buffers,
this is incorrect as it will modify the buffer address.

Store the pre-committed value and return that. If that isn't done, then
the initial part of the buffer is not used and the application will
correctly assume the content arrived at the start of the userspace
buffer, but the kernel will have put it later in the buffer. Or it can
cause a spurious -EFAULT returned in the CQE, depending on the buffer
size. As bounds are suitably checked for doing the actual IO, no adverse
side effects are possible - it's just a data misplacement within the
existing buffer.

Reported-by: Gwendal Fernet <gwendalfernet@gmail.com>
Cc: stable@vger.kernel.org
Fixes: ae98dbf43d ("io_uring/kbuf: add support for incremental buffer consumption")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-03 09:38:37 -07:00
Mark Zhang
45d339fefa RDMA/mlx5: Enable multiplane mode only when it is supported
Driver queries vport_cxt.num_plane and enables multiplane when it is
greater then 0, but some old FWs (versions from x.40.1000 till x.42.1000),
report vport_cxt.num_plane = 1 unexpectedly.

Fix it by querying num_plane only when HCA_CAP2.multiplane bit is set.

Fixes: 2a5db20fa5 ("RDMA/mlx5: Add support to multi-plane device and port")
Link: https://patch.msgid.link/r/1ef901acdf564716fcf550453cf5e94f343777ec.1734610916.git.leon@kernel.org
Cc: stable@vger.kernel.org
Reported-by: Francesco Poli <invernomuto@paranoici.org>
Closes: https://lore.kernel.org/all/nvs4i2v7o6vn6zhmtq4sgazy2hu5kiulukxcntdelggmznnl7h@so3oul6uwgbl/
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-03 09:17:19 -04:00
David S. Miller
ce21419b55 Merge branch 'net-iep-clock-module-fixes'
Meghana Malladi says:

====================
IEP clock module bug fixes

This series has some bug fixes for IEP module needed by PPS and
timesync operations.

Patch 1/2 fixes firmware load sequence to run all the firmwares
when either of the ethernet interfaces is up. Move all the code
common for firmware bringup under common functions.

Patch 2/2 fixes distorted PPS signal when the ethernet interfaces
are brough down and up. This patch also fixes enabling PPS signal
after bringing the interface up, without disabling PPS.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-03 11:54:06 +00:00
Meghana Malladi
9b11536124 net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init
When ICSSG interfaces are brought down and brought up again, the
pru cores are shut down and booted again, flushing out all the memories
and start again in a clean state. Hence it is expected that the
IEP_CMP_CFG register needs to be flushed during iep_init() to ensure
that the existing residual configuration doesn't cause any unusual
behavior. If the register is not cleared, existing IEP_CMP_CFG set for
CMP1 will result in SYNC0_OUT signal based on the SYNC_OUT register values.

After bringing the interface up, calling PPS enable doesn't work as
the driver believes PPS is already enabled, (iep->pps_enabled is not
cleared during interface bring down) and driver will just return true
even though there is no signal. Fix this by disabling pps and perout.

Fixes: c1e0230eea ("net: ti: icss-iep: Add IEP driver")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-03 11:54:06 +00:00
MD Danish Anwar
9facce84f4 net: ti: icssg-prueth: Fix firmware load sequence.
Timesync related operations are ran in PRU0 cores for both ICSSG SLICE0
and SLICE1. Currently whenever any ICSSG interface comes up we load the
respective firmwares to PRU cores and whenever interface goes down, we
stop the resective cores. Due to this, when SLICE0 goes down while
SLICE1 is still active, PRU0 firmwares are unloaded and PRU0 core is
stopped. This results in clock jump for SLICE1 interface as the timesync
related operations are no longer running.

As there are interdependencies between SLICE0 and SLICE1 firmwares,
fix this by running both PRU0 and PRU1 firmwares as long as at least 1
ICSSG interface is up. Add new flag in prueth struct to check if all
firmwares are running and remove the old flag (fw_running).

Use emacs_initialized as reference count to load the firmwares for the
first and last interface up/down. Moving init_emac_mode and fw_offload_mode
API outside of icssg_config to icssg_common_start API as they need
to be called only once per firmware boot.

Change prueth_emac_restart() to return error code and add error prints
inside the caller of this functions in case of any failures.

Move prueth_emac_stop() from common to sr1 driver.
sr1 and sr2 drivers have different logic handling for stopping
the firmwares. While sr1 driver is dependent on emac structure
to stop the corresponding pru cores for that slice, for sr2
all the pru cores of both the slices are stopped and is not
dependent on emac. So the prueth_emac_stop() function is no
longer common and can be moved to sr1 driver.

Fixes: c1e0230eea ("net: ti: icss-iep: Add IEP driver")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-03 11:54:06 +00:00
Jakub Kicinski
3473020d0f Merge branch 'mptcp-rx-path-fixes'
Matthieu Baerts says:

====================
mptcp: rx path fixes

Here are 3 different fixes, all related to the MPTCP receive buffer:

- Patch 1: fix receive buffer space when recvmsg() blocks after
  receiving some data. For a fix introduced in v6.12, backported to
  v6.1.

- Patch 2: mptcp_cleanup_rbuf() can be called when no data has been
  copied. For 5.11.

- Patch 3: prevent excessive coalescing on receive, which can affect the
  throughput badly. It looks better to wait a bit before backporting
  this one to stable versions, to get more results. For 5.10.
====================

Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-0-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 18:44:06 -08:00
Paolo Abeni
56b824eb49 mptcp: prevent excessive coalescing on receive
Currently the skb size after coalescing is only limited by the skb
layout (the skb must not carry frag_list). A single coalesced skb
covering several MSS can potentially fill completely the receive
buffer. In such a case, the snd win will zero until the receive buffer
will be empty again, affecting tput badly.

Fixes: 8268ed4c9d ("mptcp: introduce and use mptcp_try_coalesce()")
Cc: stable@vger.kernel.org # please delay 2 weeks after 6.13-final release
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-3-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 18:44:03 -08:00
Paolo Abeni
551844f26d mptcp: don't always assume copied data in mptcp_cleanup_rbuf()
Under some corner cases the MPTCP protocol can end-up invoking
mptcp_cleanup_rbuf() when no data has been copied, but such helper
assumes the opposite condition.

Explicitly drop such assumption and performs the costly call only
when strictly needed - before releasing the msk socket lock.

Fixes: fd8976790a ("mptcp: be careful on MPTCP-level ack.")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-2-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 18:44:03 -08:00
Paolo Abeni
449e6912a2 mptcp: fix recvbuffer adjust on sleeping rcvmsg
If the recvmsg() blocks after receiving some data - i.e. due to
SO_RCVLOWAT - the MPTCP code will attempt multiple times to
adjust the receive buffer size, wrongly accounting every time the
cumulative of received data - instead of accounting only for the
delta.

Address the issue moving mptcp_rcv_space_adjust just after the
data reception and passing it only the just received bytes.

This also removes an unneeded difference between the TCP and MPTCP
RX code path implementation.

Fixes: 5813022985 ("mptcp: error out earlier on disconnect")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-1-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 18:44:03 -08:00
Eric Dumazet
260466b576 ila: serialize calls to nf_register_net_hooks()
syzbot found a race in ila_add_mapping() [1]

commit 031ae72825 ("ila: call nf_unregister_net_hooks() sooner")
attempted to fix a similar issue.

Looking at the syzbot repro, we have concurrent ILA_CMD_ADD commands.

Add a mutex to make sure at most one thread is calling nf_register_net_hooks().

[1]
 BUG: KASAN: slab-use-after-free in rht_key_hashfn include/linux/rhashtable.h:159 [inline]
 BUG: KASAN: slab-use-after-free in __rhashtable_lookup.constprop.0+0x426/0x550 include/linux/rhashtable.h:604
Read of size 4 at addr ffff888028f40008 by task dhcpcd/5501

CPU: 1 UID: 0 PID: 5501 Comm: dhcpcd Not tainted 6.13.0-rc4-syzkaller-00054-gd6ef8b40d075 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
 <IRQ>
  __dump_stack lib/dump_stack.c:94 [inline]
  dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
  print_address_description mm/kasan/report.c:378 [inline]
  print_report+0xc3/0x620 mm/kasan/report.c:489
  kasan_report+0xd9/0x110 mm/kasan/report.c:602
  rht_key_hashfn include/linux/rhashtable.h:159 [inline]
  __rhashtable_lookup.constprop.0+0x426/0x550 include/linux/rhashtable.h:604
  rhashtable_lookup include/linux/rhashtable.h:646 [inline]
  rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline]
  ila_lookup_wildcards net/ipv6/ila/ila_xlat.c:127 [inline]
  ila_xlat_addr net/ipv6/ila/ila_xlat.c:652 [inline]
  ila_nf_input+0x1ee/0x620 net/ipv6/ila/ila_xlat.c:185
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_slow+0xbb/0x200 net/netfilter/core.c:626
  nf_hook.constprop.0+0x42e/0x750 include/linux/netfilter.h:269
  NF_HOOK include/linux/netfilter.h:312 [inline]
  ipv6_rcv+0xa4/0x680 net/ipv6/ip6_input.c:309
  __netif_receive_skb_one_core+0x12e/0x1e0 net/core/dev.c:5672
  __netif_receive_skb+0x1d/0x160 net/core/dev.c:5785
  process_backlog+0x443/0x15f0 net/core/dev.c:6117
  __napi_poll.constprop.0+0xb7/0x550 net/core/dev.c:6883
  napi_poll net/core/dev.c:6952 [inline]
  net_rx_action+0xa94/0x1010 net/core/dev.c:7074
  handle_softirqs+0x213/0x8f0 kernel/softirq.c:561
  __do_softirq kernel/softirq.c:595 [inline]
  invoke_softirq kernel/softirq.c:435 [inline]
  __irq_exit_rcu+0x109/0x170 kernel/softirq.c:662
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:678
  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
  sysvec_apic_timer_interrupt+0xa4/0xc0 arch/x86/kernel/apic/apic.c:1049

Fixes: 7f00feaf10 ("ila: Add generic ILA translation facility")
Reported-by: syzbot+47e761d22ecf745f72b9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6772c9ae.050a0220.2f3838.04c7.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Tom Herbert <tom@herbertland.com>
Link: https://patch.msgid.link/20241230162849.2795486-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 18:42:32 -08:00
Eric Dumazet
f91a5b8089 af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK
Blamed commit forgot MSG_PEEK case, allowing a crash [1] as found
by syzbot.

Rework vlan_get_protocol_dgram() to not touch skb at all,
so that it can be used from many cpus on the same skb.

Add a const qualifier to skb argument.

[1]
skbuff: skb_under_panic: text:ffffffff8a8ccd05 len:29 put:14 head:ffff88807fc8e400 data:ffff88807fc8e3f4 tail:0x11 end:0x140 dev:<NULL>
------------[ cut here ]------------
 kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 5892 Comm: syz-executor883 Not tainted 6.13.0-rc4-syzkaller-00054-gd6ef8b40d075 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
 RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
 RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0b 8d 48 c7 c6 86 d5 25 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 5a 69 79 f7 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:ffffc900038d7638 EFLAGS: 00010282
RAX: 0000000000000087 RBX: dffffc0000000000 RCX: 609ffd18ea660600
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffff88802483c8d0 R08: ffffffff817f0a8c R09: 1ffff9200071ae60
R10: dffffc0000000000 R11: fffff5200071ae61 R12: 0000000000000140
R13: ffff88807fc8e400 R14: ffff88807fc8e3f4 R15: 0000000000000011
FS:  00007fbac5e006c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbac5e00d58 CR3: 000000001238e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  skb_push+0xe5/0x100 net/core/skbuff.c:2636
  vlan_get_protocol_dgram+0x165/0x290 net/packet/af_packet.c:585
  packet_recvmsg+0x948/0x1ef0 net/packet/af_packet.c:3552
  sock_recvmsg_nosec net/socket.c:1033 [inline]
  sock_recvmsg+0x22f/0x280 net/socket.c:1055
  ____sys_recvmsg+0x1c6/0x480 net/socket.c:2803
  ___sys_recvmsg net/socket.c:2845 [inline]
  do_recvmmsg+0x426/0xab0 net/socket.c:2940
  __sys_recvmmsg net/socket.c:3014 [inline]
  __do_sys_recvmmsg net/socket.c:3037 [inline]
  __se_sys_recvmmsg net/socket.c:3030 [inline]
  __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3030
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 79eecf631c ("af_packet: Handle outgoing VLAN packets without hardware offloading")
Reported-by: syzbot+74f70bb1cb968bf09e4f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6772c485.050a0220.2f3838.04c5.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Chengen Du <chengen.du@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241230161004.2681892-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 18:40:59 -08:00
Eric Dumazet
77ee7a6d16 af_packet: fix vlan_get_tci() vs MSG_PEEK
Blamed commit forgot MSG_PEEK case, allowing a crash [1] as found
by syzbot.

Rework vlan_get_tci() to not touch skb at all,
so that it can be used from many cpus on the same skb.

Add a const qualifier to skb argument.

[1]
skbuff: skb_under_panic: text:ffffffff8a8da482 len:32 put:14 head:ffff88807a1d5800 data:ffff88807a1d5810 tail:0x14 end:0x140 dev:<NULL>
------------[ cut here ]------------
 kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 UID: 0 PID: 5880 Comm: syz-executor172 Not tainted 6.13.0-rc3-syzkaller-00762-g9268abe611b0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
 RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
 RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0b 8d 48 c7 c6 9e 6c 26 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 3a 5a 79 f7 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:ffffc90003baf5b8 EFLAGS: 00010286
RAX: 0000000000000087 RBX: dffffc0000000000 RCX: 8565c1eec37aa000
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffff88802616fb50 R08: ffffffff817f0a4c R09: 1ffff92000775e50
R10: dffffc0000000000 R11: fffff52000775e51 R12: 0000000000000140
R13: ffff88807a1d5800 R14: ffff88807a1d5810 R15: 0000000000000014
FS:  00007fa03261f6c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd65753000 CR3: 0000000031720000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  skb_push+0xe5/0x100 net/core/skbuff.c:2636
  vlan_get_tci+0x272/0x550 net/packet/af_packet.c:565
  packet_recvmsg+0x13c9/0x1ef0 net/packet/af_packet.c:3616
  sock_recvmsg_nosec net/socket.c:1044 [inline]
  sock_recvmsg+0x22f/0x280 net/socket.c:1066
  ____sys_recvmsg+0x1c6/0x480 net/socket.c:2814
  ___sys_recvmsg net/socket.c:2856 [inline]
  do_recvmmsg+0x426/0xab0 net/socket.c:2951
  __sys_recvmmsg net/socket.c:3025 [inline]
  __do_sys_recvmmsg net/socket.c:3048 [inline]
  __se_sys_recvmmsg net/socket.c:3041 [inline]
  __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3041
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83

Fixes: 79eecf631c ("af_packet: Handle outgoing VLAN packets without hardware offloading")
Reported-by: syzbot+8400677f3fd43f37d3bc@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6772c485.050a0220.2f3838.04c6.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Chengen Du <chengen.du@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241230161004.2681892-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 18:40:54 -08:00
Maciej S. Szmigiero
a7af435df0 net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init()
ipc_mmio_init() used the post-decrement operator in its loop continuing
condition of "retries" counter being "> 0", which meant that when this
condition caused loop exit "retries" counter reached -1.

But the later valid exec stage failure check only tests for "retries"
counter being exactly zero, so it didn't trigger in this case (but
would wrongly trigger if the code reaches a valid exec stage in the
very last loop iteration).

Fix this by using the pre-decrement operator instead, so the loop counter
is exactly zero on valid exec stage failure.

Fixes: dc0514f5d8 ("net: iosm: mmio scratchpad")
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Link: https://patch.msgid.link/8b19125a825f9dcdd81c667c1e5c48ba28d505a6.1735490770.git.mail@maciej.szmigiero.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 18:37:50 -08:00
Eric Dumazet
5b0af621c3 net: restrict SO_REUSEPORT to inet sockets
After blamed commit, crypto sockets could accidentally be destroyed
from RCU call back, as spotted by zyzbot [1].

Trying to acquire a mutex in RCU callback is not allowed.

Restrict SO_REUSEPORT socket option to inet sockets.

v1 of this patch supported TCP, UDP and SCTP sockets,
but fcnal-test.sh test needed RAW and ICMP support.

[1]
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:562
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 24, name: ksoftirqd/1
preempt_count: 100, expected: 0
RCU nest depth: 0, expected: 0
1 lock held by ksoftirqd/1/24:
  #0: ffffffff8e937ba0 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
  #0: ffffffff8e937ba0 (rcu_callback){....}-{0:0}, at: rcu_do_batch kernel/rcu/tree.c:2561 [inline]
  #0: ffffffff8e937ba0 (rcu_callback){....}-{0:0}, at: rcu_core+0xa37/0x17a0 kernel/rcu/tree.c:2823
Preemption disabled at:
 [<ffffffff8161c8c8>] softirq_handle_begin kernel/softirq.c:402 [inline]
 [<ffffffff8161c8c8>] handle_softirqs+0x128/0x9b0 kernel/softirq.c:537
CPU: 1 UID: 0 PID: 24 Comm: ksoftirqd/1 Not tainted 6.13.0-rc3-syzkaller-00174-ga024e377efed #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:94 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
  __might_resched+0x5d4/0x780 kernel/sched/core.c:8758
  __mutex_lock_common kernel/locking/mutex.c:562 [inline]
  __mutex_lock+0x131/0xee0 kernel/locking/mutex.c:735
  crypto_put_default_null_skcipher+0x18/0x70 crypto/crypto_null.c:179
  aead_release+0x3d/0x50 crypto/algif_aead.c:489
  alg_do_release crypto/af_alg.c:118 [inline]
  alg_sock_destruct+0x86/0xc0 crypto/af_alg.c:502
  __sk_destruct+0x58/0x5f0 net/core/sock.c:2260
  rcu_do_batch kernel/rcu/tree.c:2567 [inline]
  rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823
  handle_softirqs+0x2d4/0x9b0 kernel/softirq.c:561
  run_ksoftirqd+0xca/0x130 kernel/softirq.c:950
  smpboot_thread_fn+0x544/0xa30 kernel/smpboot.c:164
  kthread+0x2f0/0x390 kernel/kthread.c:389
  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>

Fixes: 8c7138b33e ("net: Unpublish sk from sk_reuseport_cb before call_rcu")
Reported-by: syzbot+b3e02953598f447d4d2a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6772f2f4.050a0220.2f3838.04cb.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241231160527.3994168-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 17:16:58 -08:00
Willem de Bruijn
68e068cabd net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets
The blamed commit disabled hardware offoad of IPv6 packets with
extension headers on devices that advertise NETIF_F_IPV6_CSUM,
based on the definition of that feature in skbuff.h:

 *   * - %NETIF_F_IPV6_CSUM
 *     - Driver (device) is only able to checksum plain
 *       TCP or UDP packets over IPv6. These are specifically
 *       unencapsulated packets of the form IPv6|TCP or
 *       IPv6|UDP where the Next Header field in the IPv6
 *       header is either TCP or UDP. IPv6 extension headers
 *       are not supported with this feature. This feature
 *       cannot be set in features for a device with
 *       NETIF_F_HW_CSUM also set. This feature is being
 *       DEPRECATED (see below).

The change causes skb_warn_bad_offload to fire for BIG TCP
packets.

[  496.310233] WARNING: CPU: 13 PID: 23472 at net/core/dev.c:3129 skb_warn_bad_offload+0xc4/0xe0

[  496.310297]  ? skb_warn_bad_offload+0xc4/0xe0
[  496.310300]  skb_checksum_help+0x129/0x1f0
[  496.310303]  skb_csum_hwoffload_help+0x150/0x1b0
[  496.310306]  validate_xmit_skb+0x159/0x270
[  496.310309]  validate_xmit_skb_list+0x41/0x70
[  496.310312]  sch_direct_xmit+0x5c/0x250
[  496.310317]  __qdisc_run+0x388/0x620

BIG TCP introduced an IPV6_TLV_JUMBO IPv6 extension header to
communicate packet length, as this is an IPv6 jumbogram. But, the
feature is only enabled on devices that support BIG TCP TSO. The
header is only present for PF_PACKET taps like tcpdump, and not
transmitted by physical devices.

For this specific case of extension headers that are not
transmitted, return to the situation before the blamed commit
and support hardware offload.

ipv6_has_hopopt_jumbo() tests not only whether this header is present,
but also that it is the only extension header before a terminal (L4)
header.

Fixes: 04c20a9356 ("net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/netdev/CANn89iK1hdC3Nt8KPhOtTF8vCPc1AHDCtse_BTNki1pWxAByTQ@mail.gmail.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250101164909.1331680-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 17:13:44 -08:00
Liang Jie
a8620de72e net: sfc: Correct key_len for efx_tc_ct_zone_ht_params
In efx_tc_ct_zone_ht_params, the key_len was previously set to
offsetof(struct efx_tc_ct_zone, linkage). This calculation is incorrect
because it includes any padding between the zone field and the linkage
field due to structure alignment, which can vary between systems.

This patch updates key_len to use sizeof_field(struct efx_tc_ct_zone, zone)
, ensuring that the hash table correctly uses the zone as the key. This fix
prevents potential hash lookup errors and improves connection tracking
reliability.

Fixes: c3bb5c6acd ("sfc: functions to register for conntrack zone offload")
Signed-off-by: Liang Jie <liangjie@lixiang.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20241230093709.3226854-1-buaajxlj@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-02 17:11:41 -08:00
Dave Airlie
273b3eb600 Merge tag 'drm-xe-fixes-2025-01-02' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- A couple of OA fixes squashed for stable backporting (Umesh)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z3bur0RmH6-70YSh@fedora
2025-01-03 10:57:31 +10:00
Dave Airlie
198c653edf Merge tag 'drm-misc-fixes-2025-01-02' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.13-rc6:
- Only fixes for adv7511 driver, including a use-after-free.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f58429b7-5f11-4b78-b577-de32b41299ea@linux.intel.com
2025-01-03 10:43:36 +10:00
Dave Airlie
48fc4378de Merge tag 'drm-intel-fixes-2024-12-25' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix C10 pll programming sequence [cx0_phy] (Suraj Kandpal)
- Fix power gate sequence. [dg1] (Rodrigo Vivi)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z2wKf7tmElKFdnoP@linux
2025-01-03 10:40:43 +10:00
Dave Airlie
3bce3cc644 Merge tag 'drm-xe-fixes-2024-12-23' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
UAPI Changes:
- Revert some devcoredump file format changes
  breaking a mesa debug tool (John)

Driver Changes:
- Fixes around waits when moving to system (Nirmoy)
- Fix a typo when checking for LMEM provisioning (Michal)
- Fix a fault on fd close after unbind (Lucas)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z2mjt7OTfH76cgua@fedora
2025-01-03 10:28:48 +10:00
Jens Axboe
c6e60a0a68 io_uring/net: always initialize kmsg->msg.msg_inq upfront
syzbot reports that ->msg_inq may get used uinitialized from the
following path:

BUG: KMSAN: uninit-value in io_recv_buf_select io_uring/net.c:1094 [inline]
BUG: KMSAN: uninit-value in io_recv+0x930/0x1f90 io_uring/net.c:1158
 io_recv_buf_select io_uring/net.c:1094 [inline]
 io_recv+0x930/0x1f90 io_uring/net.c:1158
 io_issue_sqe+0x420/0x2130 io_uring/io_uring.c:1740
 io_queue_sqe io_uring/io_uring.c:1950 [inline]
 io_req_task_submit+0xfa/0x1d0 io_uring/io_uring.c:1374
 io_handle_tw_list+0x55f/0x5c0 io_uring/io_uring.c:1057
 tctx_task_work_run+0x109/0x3e0 io_uring/io_uring.c:1121
 tctx_task_work+0x6d/0xc0 io_uring/io_uring.c:1139
 task_work_run+0x268/0x310 kernel/task_work.c:239
 io_run_task_work+0x43a/0x4a0 io_uring/io_uring.h:343
 io_cqring_wait io_uring/io_uring.c:2527 [inline]
 __do_sys_io_uring_enter io_uring/io_uring.c:3439 [inline]
 __se_sys_io_uring_enter+0x204f/0x4ce0 io_uring/io_uring.c:3330
 __x64_sys_io_uring_enter+0x11f/0x1a0 io_uring/io_uring.c:3330
 x64_sys_call+0xce5/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:427
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

and it is correct, as it's never initialized upfront. Hence the first
submission can end up using it uninitialized, if the recv wasn't
successful and the networking stack didn't honor ->msg_get_inq being set
and filling in the output value of ->msg_inq as requested.

Set it to 0 upfront when it's allocated, just to silence this KMSAN
warning. There's no side effect of using it uninitialized, it'll just
potentially cause the next receive to use a recv value hint that's not
accurate.

Fixes: c6f32c7d9e ("io_uring/net: get rid of ->prep_async() for receive side")
Reported-by: syzbot+068ff190354d2f74892f@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-02 16:40:08 -07:00