This driver's remove path calls cancel_delayed_work(). However, that
function does not wait until the work function finishes. This means
that the callback function may still be running after the driver's
remove function has finished, which would result in a use-after-free.
Fix by calling cancel_delayed_work_sync(), which ensures that
the work is properly cancelled, no longer running, and unable
to re-schedule itself.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20210407092716.3270248-1-yangyingliang@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It appears that the STM code didn't manage to accurately decypher the
delicate inner workings of an alternative thought process behind the
UUID API and directly called generate_random_uuid() that clearly needs
to be a static function in lib/uuid.c.
At the same time, said STM code is poking directly at the byte array
inside the uuid_t when it uses the UUID for its internal purposes.
Fix these two transgressions by using intended APIs instead.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[ash: changed back to uuid_t and updated the commit message]
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20210415091555.88085-1-alexander.shishkin@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Georgi writes:
interconnect changes for 5.13
These are the interconnect changes for the 5.13-rc1 merge window
with the highlights being drivers for two new platforms.
Driver changes:
- New driver for SM8350 platforms.
- New driver for SDM660 platforms.
Signed-off-by: Georgi Djakov <djakov@kernel.org>
* tag 'icc-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
interconnect: qcom: sm8350: Add missing link between nodes
interconnect: qcom: sm8350: Use the correct ids
interconnect: qcom: sdm660: Fix kerneldoc warning
MAINTAINERS: icc: add interconnect tree
interconnect: qcom: Add SM8350 interconnect provider driver
dt-bindings: interconnect: Add Qualcomm SM8350 DT bindings
interconnect: qcom: icc-rpm: record slave RPM id in error log
interconnect: qcom: Add SDM660 interconnect provider driver
dt-bindings: interconnect: Add bindings for Qualcomm SDM660 NoC
Manivannan writes:
MHI changes for v5.13
core:
- Added support for Flash Programmer execution environment which allows the
host machine (like x86) to flash the modem firmware to NAND or eMMC in the
modem. The MHI bus will expose EDL channels (34, 35) and then the opensource
QDL tool [1] can be used to flash the firmware from the host.
- Added an internal helper for polling the MHI registers with a retry interval.
This helper is used now to poll for the MHI ready state in MHI STATUS
register.
- Various fixes for issues found during the bringup of SDX24/SDX55 based Quectel
and Telit modems.
- Updates to the Execution environment handling for proper downloading of the
AMSS image from SBL (Secondary Bootloader) mode.
- Added support for sending STOP channel command to the MHI device and also made
changes to the MHI core for proper handling of stop and restart.
- Fixed the runtime_pm handling in the core by forcing the device to be in wake
mode until TX completion and allowing it to suspend for RX.
- Added sanity checks for values read from the device to avoid crash if those
are corrupted somehow.
- Fixed warnings generated by sparse (W=2)
- Couple of kernel doc cleanups in mhi.h
pci_generic:
- Added support for runtime PM and generic PM
- Added Firehose channels for flashing the firmware
- Added support for modems such as Quectel EM1XXGR-L, SDX24, SDX65, Foxconn
T99W175 exposing relevant channels.
[1] https://git.linaro.org/landing-teams/working/qualcomm/qdl.git
* tag 'mhi-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: (49 commits)
bus: mhi: fix typo in comments for struct mhi_channel_config
bus: mhi: core: Fix shadow declarations
bus: mhi: pci_generic: Constify mhi_controller_config struct definitions
bus: mhi: pci_generic: Introduce Foxconn T99W175 support
bus: mhi: core: Sanity check values from remote device before use
bus: mhi: pci_generic: Add FIREHOSE channels
bus: mhi: pci_generic: Implement PCI shutdown callback
bus: mhi: Improve documentation on channel transfer setup APIs
bus: mhi: core: Remove __ prefix for MHI channel unprepare function
bus: mhi: core: Check channel execution environment before issuing reset
bus: mhi: core: Clear configuration from channel context during reset
bus: mhi: core: Hold device wake for channel update commands
bus: mhi: core: Update debug messages to use client device
bus: mhi: core: Improvements to the channel handling state machine
bus: mhi: core: Clear context for stopped channels from remove()
bus: mhi: core: Allow sending the STOP channel command
bus: mhi: pci_generic: Add SDX65 based modem support
bus: mhi: core: Remove pre_init flag used for power purposes
bus: mhi: pm: reduce PM state change verbosity
bus: mhi: core: Fix MHI runtime_pm behavior
...
Oded writes:
This tag contains habanalabs driver changes for v5.13:
- Add support to reset device after the user closes the file descriptor.
Because we support a single user, we can reset the device (if needs to)
after a user closes its file descriptor to make sure the device is in
idle and clean state for the next user.
- Add a new feature to allow the user to wait on interrupt. This is needed
for future ASICs
- Replace GFP_ATOMIC with GFP_KERNEL wherever possible and add code to
support failure of allocating with GFP_ATOMIC.
- Update code to support the latest firmware image:
- More security features are done in the firmware
- Remove hard-coded assumptions and replace them with values that are
sent to the firmware on loading.
- Print device unusable error
- Reset device in case the communication between driver and firmware
gets out of sync.
- Support new PCI device ids for secured GAUDI.
- Expose current power draw through the INFO IOCTL.
- Support resetting the device upon a request from the BMC (through F/W).
- Always use only a single MSI in GAUDI, due to H/W limitation.
- Improve data-path code by taking out code from spinlock protection.
- Allow user to specify custom timeout per Command Submission.
- Some enhancements to debugfs.
- Various minor changes and improvements.
* tag 'misc-habanalabs-next-2021-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (41 commits)
habanalabs: print f/w boot unknown error
habanalabs: update to latest F/W communication header
habanalabs/gaudi: skip iATU if F/W security is enabled
habanalabs/gaudi: derive security status from pci id
habanalabs: move dram scrub to free sequence
habanalabs: send dynamic msi-x indexes to f/w
habanalabs/gaudi: clear QM errors only if not in stop_on_err mode
habanalabs: support DEVICE_UNUSABLE error indication from FW
habanalabs: use strscpy instead of sprintf and strlcpy
habanalabs: remove the store jobs array from CS IOCTL
habanalabs/gaudi: add debugfs to DMA from the device
habanalabs/gaudi: sync stream add protection to SOB reset flow
habanalabs: add custom timeout flag per cs
habanalabs: improve utilization calculation
habanalabs: support legacy and new pll indexes
habanalabs: move relevant datapath work outside cs lock
habanalabs: avoid soft lockup bug upon mapping error
habanalabs/gaudi: Update async events header
habanalabs/gaudi: unsecure TPC cfg status registers
habanalabs/gaudi: always use single-msi mode
...
When CONFIG_QCOM_SCM is y and CONFIG_HAVE_ARM_SMCCC
is not set, compiling errors are encountered as follows:
drivers/firmware/qcom_scm-smc.o: In function `__scm_smc_do_quirk':
qcom_scm-smc.c:(.text+0x36): undefined reference to `__arm_smccc_smc'
drivers/firmware/qcom_scm-legacy.o: In function `scm_legacy_call':
qcom_scm-legacy.c:(.text+0xe2): undefined reference to `__arm_smccc_smc'
drivers/firmware/qcom_scm-legacy.o: In function `scm_legacy_call_atomic':
qcom_scm-legacy.c:(.text+0x1f0): undefined reference to `__arm_smccc_smc'
Note that __arm_smccc_smc is defined when HAVE_ARM_SMCCC is y.
So add dependency on HAVE_ARM_SMCCC in QCOM_SCM configuration.
Fixes: 916f743da3 ("firmware: qcom: scm: Move the scm driver to drivers/firmware")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: He Ying <heying24@huawei.com>
Link: https://lore.kernel.org/r/20210406094200.60952-1-heying24@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When async binder buffer got exhausted, some normal oneway transactions
will also be discarded and may cause system or application failures. By
that time, the binder debug information we dump may not be relevant to
the root cause. And this issue is difficult to debug if without the
backtrace of the thread sending spam.
This change will send BR_ONEWAY_SPAM_SUSPECT to userspace when oneway
spamming is detected, request to dump current backtrace. Oneway spamming
will be reported only once when exceeding the threshold (target process
dips below 80% of its oneway space, and current process is responsible for
either more than 50 transactions, or more than 50% of the oneway space).
And the detection will restart when the async buffer has returned to a
healthy state.
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Hang Lu <hangl@codeaurora.org>
Link: https://lore.kernel.org/r/1617961246-4502-3-git-send-email-hangl@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to print a message to the kernel log in case we encounter
an unknown error in the f/w boot to help the user understand what
happened.
In addition, we shouldn't print unknown error in case of known errors.
Moreover, in case of warnings/info, we shouldn't return -EIO that will
fail the initialization and mark the device as disabled
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As part of the securing GAUDI, the F/W will configure the PCI iATU
regions. If the driver identifies a secured PCI ID, it will know to
skip iATU configuration in a very early stage.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As F/ security indication must be available before driver approaches
PCI bus, F/W security should be derived from PCI id rather than be
fetched during boot handshake with F/W.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
DRAM scrubbing can take time hence it adds to latency during allocation.
To minimize latency during initialization, scrubbing is moved to release
call.
In case scrubbing fails it means the device is in a bad state,
hence HARD reset is initiated.
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to minimize hard coded values between F/W and the driver, we
send msi-x indexes dynamically to the F/W.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Clearing QM errors by the driver will prevent these H/W blocks from
stopping in case they are configured to stop on errors, so perform this
clearing only if this mode is not in use.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In case of multiple ECC errors, FW will set the DEVICE_UNUSABLE bit.
On boot-up, the driver will therefore fail inserting the device.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Prefer the use of strscpy when copying the ASIC name into a char array,
to prevent accidentally exceeding the array's length.
In addition, strlcpy is frowned upon so replace it.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The store part was never implemented in the code and never been used
by the userspace applications.
We currently use the related parameters to a different purpose with
a defined union. However, there is no point in that and it is better
to just remove the union and the store parameters.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When trying to debug program, the user often needs to
dump large parts of the device's DRAM, which can reach to tens of GBs.
Because reading from the device's internal memory through the PCI BAR
is extremely slow, the debug can take hours.
Instead, we can provide the user to copy data through one of the DMA
engines. This will make the operation much faster.
Currently, only GAUDI is supported.
In GAUDI, we need to find a PCI DMA engine that is IDLE and set the
DMA as secured to be able to bypass our MMU as we currently don't
map the temporary buffer to the MMU.
Example bash one-line to dump entire HBM to file (~2 minutes):
for (( i=0x0; i < 0x800000000; i+=0x8000000 )); do \
printf '0x%x\n' $i | sudo tee /sys/kernel/debug/habanalabs/hl0/addr ; \
echo 0x8000000 | sudo tee /sys/kernel/debug/habanalabs/hl0/dma_size ; \
sudo cat /sys/kernel/debug/habanalabs/hl0/data_dma >> hbm.txt ; done
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Since we moved the SOB reset flow to workqueue and
not part of the fence release flow, we might reach a
scenario where new context is created while we in the middle
of resetting the SOB.
in such cases the reset may fail due to idle check.
This will mess up the streams sync since the SOB value is invalid.
so we protect this area with a mutex, to delay context creation.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
There is a need to allow to user to send command submissions with
custom timeout as some CS take longer than the max timeout that is
used by default.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The new approach is based on the notion that the relative
current power consumption is in relation of proportionality
to device's true utilization.
Utilization info ranges between [0,100]%
Currently, dc_power values are hard-coded.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to use minimum of hard coded values common to LKD and F/W
a dynamic method to work with PLLs is introduced in this patch.
Formerly asic specific PLL numbering is now common for all asics.
To be backward compatible a bit in dev status is defined, if the bit is
not set LKD will keep working with old PLL numbering.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to shorten the time cs lock is being held, we move any
possible work outside of the cs lock.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Add a little sleep between page unmappings in case mapping of
large number of host pages failed, in order to
avoid soft lockup bug during the rollback.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The device can get into deadlock in case it use indirect mode for MSI
interrupts (multi-msi) and have hard-reset during interrupt storm.
To prevent that, always use direct mode which means single-msi mode.
The F/W will prevent the host from writing to the indirect MSI
registers to prevent any malicious user from causing this scenario.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In case the BMC of the devices' box wants to initiate a reset of
a specific device, it must go through driver.
Once driver will receive the request it will initiate a hard reset
flow.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to have a better debuggability we allow debugfs access
to user mmu mapped host memory. Non-user host memory access will be
rejected.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
if reset is due to heartbeat, device CPU is no responsive in which
case no point sending PCI disable message to it.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As there are incorrect assumptions in which some of the
initialization and data path flows cannot sleep, most allocations
are being done using GFP_ATOMIC.
We modify the code to use GFP_ATOMIC only when realy needed, as
sleepable flow should use GFP_KERNEL.
In addition add a fallback to allocate memory using GFP_KERNEL,
once ATOMIC allocation fails.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>