mirror of
https://github.com/torvalds/linux.git
synced 2026-01-25 15:03:52 +08:00
53989fad1286e652ea3655ae3367ba698da8d2ff
A test of the form:
while true; do modprobe -r cxl_pmem; modprobe cxl_pmem; done
May lead to a crash signature of the form:
BUG: unable to handle page fault for address: ffffffffc0660030
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
[..]
Workqueue: cxl_pmem 0xffffffffc0660030
RIP: 0010:0xffffffffc0660030
Code: Unable to access opcode bytes at RIP 0xffffffffc0660006.
[..]
Call Trace:
? process_one_work+0x4ec/0x9c0
? pwq_dec_nr_in_flight+0x100/0x100
? rwlock_bug.part.0+0x60/0x60
? worker_thread+0x2eb/0x700
In that report the 0xffffffffc0660030 address corresponds to the former
function address of cxl_nvb_update_state() from a previous load of the
module, not the current address. Fix that by arranging for ->state_work
in the 'struct cxl_nvdimm_bridge' object to be reinitialized on cxl_pmem
module reload.
Details:
Recall that CXL subsystem wants to link a CXL memory expander device to
an NVDIMM sub-hierarchy when both a persistent memory range has been
registered by the CXL platform driver (cxl_acpi) *and* when that CXL
memory expander has published persistent memory capacity (Get Partition
Info). To this end the cxl_nvdimm_bridge driver arranges to rescan the
CXL bus when either of those conditions change. The helper
bus_rescan_devices() can not be called underneath the device_lock() for
any device on that bus, so the cxl_nvdimm_bridge driver uses a workqueue
for the rescan.
Typically a driver allocates driver data to hold a 'struct work_struct'
for a driven device, but for a workqueue that may run after ->remove()
returns, driver data will have been freed. The 'struct
cxl_nvdimm_bridge' object holds the state and work_struct directly.
Unfortunately it was only arranging for that infrastructure to be
initialized once per device creation rather than the necessary once per
workqueue (cxl_pmem_wq) creation.
Introduce is_cxl_nvdimm_bridge() and cxl_nvdimm_bridge_reset() in
support of invalidating stale references to a recently destroyed
cxl_pmem_wq.
Cc: <stable@vger.kernel.org>
Fixes: 8fdcb1704f ("cxl/pmem: Add initial infrastructure for pmem support")
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/163665474585.3505991.8397182770066720755.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
…
…
…
…
Linux kernel
============
There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.
In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``. The formatted documentation can also be read online at:
https://www.kernel.org/doc/html/latest/
There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Description
Languages
C
97.1%
Assembly
1%
Shell
0.6%
Rust
0.4%
Python
0.4%
Other
0.3%