The pgsize bitmap is used to advertise the page sizes our hardware supports
to the IOMMU core, which will then use this information to split physically
contiguous memory regions it is mapping into page sizes that we support.
Traditionally the IOMMU core just handed us the mappings directly, after
making sure the size is an order of a 4KiB page and that the mapping has
natural alignment. To retain this behavior, we currently advertise that we
support all page sizes that are an order of 4KiB.
We are about to utilize the new IOMMU map/unmap_pages APIs. We could change
this to advertise the real page sizes we support.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210720020615.4144323-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If people are going to insist on calling iommu_iova_to_phys()
pointlessly and expecting it to work, we can at least do ourselves a
favour by handling those cases in the core code, rather than repeatedly
across an inconsistent handful of drivers.
Since all the existing drivers implement the internal callback, and any
future ones are likely to want to work with iommu-dma which relies on
iova_to_phys a fair bit, we may as well remove that currently-redundant
check as well and consider it mandatory.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/f564f3f6ff731b898ff7a898919bf871c2c7745a.1626354264.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Make IOMMU_DEFAULT_LAZY default for when AMD_IOMMU config is set, which
matches current behaviour.
For "fullflush" param, just call iommu_set_dma_strict(true) directly.
Since we get a strict vs lazy mode print already in iommu_subsys_init(),
and maintain a deprecation print when "fullflush" param is passed, drop the
prints in amd_iommu_init_dma_ops().
Finally drop global flag amd_iommu_unmap_flush, as it has no longer has any
purpose.
[jpg: Rebase for relocated file and drop amd_iommu_unmap_flush]
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1626088340-5838-6-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Make IOMMU_DEFAULT_LAZY default for when INTEL_IOMMU config is set,
as is current behaviour.
Also delete global flag intel_iommu_strict:
- In intel_iommu_setup(), call iommu_set_dma_strict(true) directly. Also
remove the print, as iommu_subsys_init() prints the mode and we have
already marked this param as deprecated.
- For cap_caching_mode() check in intel_iommu_setup(), call
iommu_set_dma_strict(true) directly; also reword the accompanying print
with a level downgrade and also add the missing '\n'.
- For Ironlake GPU, again call iommu_set_dma_strict(true) directly and
keep the accompanying print.
[jpg: Remove intel_iommu_strict]
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-5-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
As well as the default domain type, it's useful to know whether strict
or lazy for DMA domains, so add this info in a separate print.
The (stict/lazy) mode may be also set via iommu.strict earlyparm, but
this will be processed prior to iommu_subsys_init(), so that print will be
accurate for drivers which don't set the mode via custom means.
For the drivers which set the mode via custom means - AMD and Intel drivers
- they maintain prints to inform a change in policy or that custom cmdline
methods to change policy are deprecated.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-3-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
intended to describe the alignment requirements to consider when
choosing an appropriate page size. On the iommu_map() path, this address
is the logical OR of the virtual and physical addresses.
Subsequent improvements to iommu_pgsize() will need to check the
alignment of the virtual and physical components of 'addr_merge'
independently, so pass them in as separate parameters and reconstruct
'addr_merge' locally.
No functional change.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1623850736-389584-7-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Mapping memory into io-pgtables follows the same semantics
that unmapping memory used to follow (i.e. a buffer will be
mapped one page block per call to the io-pgtable code). This
means that it can be optimized in the same way that unmapping
memory was, so add a map_pages() callback to the io-pgtable
ops structure, so that a range of pages of the same size
can be mapped within the same call.
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Link: https://lore.kernel.org/r/1623850736-389584-4-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The io-pgtable code expects to operate on a single block or
granule of memory that is supported by the IOMMU hardware when
unmapping memory.
This means that when a large buffer that consists of multiple
such blocks is unmapped, the io-pgtable code will walk the page
tables to the correct level to unmap each block, even for blocks
that are virtually contiguous and at the same level, which can
incur an overhead in performance.
Introduce the unmap_pages() page table op to express to the
io-pgtable code that it should unmap a number of blocks of
the same size, instead of a single block. Doing so allows
multiple blocks to be unmapped in one call to the io-pgtable
code, reducing the number of page table walks, and indirect
calls.
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Link: https://lore.kernel.org/r/1623850736-389584-2-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
gcc doesn't care, but clang quite reasonably pointed out that the recent
commit e9ba16e68c ("smpboot: Mark idle_init() as __always_inlined to
work around aggressive compiler un-inlining") did some really odd
things:
kernel/smpboot.c:50:20: warning: duplicate 'inline' declaration specifier [-Wduplicate-decl-specifier]
static inline void __always_inline idle_init(unsigned int cpu)
^
which not only has that duplicate inlining specifier, but the new
__always_inline was put in the wrong place of the function definition.
We put the storage class specifiers (ie things like "static" and
"extern") first, and the type information after that. And while the
compiler may not care, we put the inline specifier before the types.
So it should be just
static __always_inline void idle_init(unsigned int cpu)
instead.
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull powerpc fixes from Michael Ellerman:
- Fix guest to host memory corruption in H_RTAS due to missing nargs
check.
- Fix guest triggerable host crashes due to bad handling of nested
guest TM state.
- Fix possible crashes due to incorrect reference counting in
kvm_arch_vcpu_ioctl().
- Two commits fixing some regressions in KVM transactional memory
handling introduced by the recent rework of the KVM code.
Thanks to Nicholas Piggin, Alexey Kardashevskiy, and Michael Neuling.
* tag 'powerpc-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state
KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow
KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak
KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash
KVM: PPC: Book3S HV P9: Fix guest TM support
Pull timer fixes from Thomas Gleixner:
"A small set of timer related fixes:
- Plug a race between rearm and process tick in the posix CPU timers
code
- Make the optimization to avoid recalculation of the next timer
interrupt work correctly when there are no timers pending"
* tag 'timers-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Fix get_next_timer_interrupt() with no timers pending
posix-cpu-timers: Fix rearm racing against process tick
Pull x86 jump label fix from Thomas Gleixner:
"A single fix for jump labels to prevent the compiler from agressive
un-inlining which results in a section mismatch"
* tag 'locking-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jump_labels: Mark __jump_label_transform() as __always_inlined to work around aggressive compiler un-inlining
Pull EFI fixes from Thomas Gleixner:
"A set of EFI fixes:
- Prevent memblock and I/O reserved resources to get out of sync when
EFI memreserve is in use.
- Don't claim a non-existing table is invalid
- Don't warn when firmware memory is already reserved correctly"
* tag 'efi-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/mokvar: Reserve the table only if it is in boot services data
efi/libstub: Fix the efi_load_initrd function description
firmware/efi: Tell memblock about EFI iomem reservations
efi/tpm: Differentiate missing and invalid final event log table.
Pull core fix from Thomas Gleixner:
"A single update for the boot code to prevent aggressive un-inlining
which causes a section mismatch"
* tag 'core-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smpboot: Mark idle_init() as __always_inlined to work around aggressive compiler un-inlining
Pull dma-mapping fix from Christoph Hellwig:
- handle vmalloc addresses in dma_common_{mmap,get_sgtable} (Roman
Skakun)
* tag 'dma-mapping-5.14-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
Pull cifs fixes from Steve French:
"Five cifs/smb3 fixes, including a DFS failover fix, two fallocate
fixes, and two trivial coverity cleanups"
* tag '5.14-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix fallocate when trying to allocate a hole.
CIFS: Clarify SMB1 code for POSIX delete file
CIFS: Clarify SMB1 code for POSIX Create
cifs: support share failover when remounting
cifs: only write 64kb at a time when fallocating a small region of a file
Pull RISC-V fixes from Palmer Dabbelt:
- properly set the memory size, which fixes 32-bit systems
- allow initrd to load anywhere in memory, rather that restricting it
to the first 256MiB
- fix the 'mem=' parameter on 64-bit systems to properly account for
the maximum supported memory now that the kernel is outside the
linear map
- avoid installing mappings into the last 4KiB of memory, which
conflicts with error values
- avoid the stack from being freed while it is being walked
- a handful of fixes to the new copy to/from user routines
* tag 'riscv-for-linus-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: __asm_copy_to-from_user: Fix: Typos in comments
riscv: __asm_copy_to-from_user: Remove unnecessary size check
riscv: __asm_copy_to-from_user: Fix: fail on RV32
riscv: __asm_copy_to-from_user: Fix: overrun copy
riscv: stacktrace: pin the task's stack in get_wchan
riscv: Make sure the kernel mapping does not overlap with IS_ERR_VALUE
riscv: Make sure the linear mapping does not use the kernel mapping
riscv: Fix memory_limit for 64-bit kernel
RISC-V: load initrd wherever it fits into memory
riscv: Fix 32-bit RISC-V boot failure
Pull SCSI fixes from James Bottomley:
"Four fixes, all in drivers, all of which can lead to user visible
problems in certain situations"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: Fix NULL dereference on XCOPY completion
scsi: mpt3sas: Transition IOC to Ready state during shutdown
scsi: target: Fix protect handling in WRITE SAME(32)
scsi: iscsi: Fix iface sysfs attr detection
Pull io_uring fixes from Jens Axboe:
- Fix a memory leak due to a race condition in io_init_wq_offload
(Yang)
- Poll error handling fixes (Pavel)
- Fix early fdput() regression (me)
- Don't reissue iopoll requests off release path (me)
- Add a safety check for io-wq queue off wrong path (me)
* tag 'io_uring-5.14-2021-07-24' of git://git.kernel.dk/linux-block:
io_uring: explicitly catch any illegal async queue attempt
io_uring: never attempt iopoll reissue from release path
io_uring: fix early fdput() of file
io_uring: fix memleak in io_init_wq_offload()
io_uring: remove double poll entry on arm failure
io_uring: explicitly count entries for poll reqs
Pull block fixes from Jens Axboe:
- NVMe pull request (Christoph):
- tracing fix (Keith Busch)
- fix multipath head refcounting (Hannes Reinecke)
- Write Zeroes vs PI fix (me)
- drop a bogus WARN_ON (Zhihao Cheng)
- Increase max blk-cgroup policy size, now that mq-deadline
uses it too (Oleksandr)
* tag 'block-5.14-2021-07-24' of git://git.kernel.dk/linux-block:
nvme: set the PRACT bit when using Write Zeroes with T10 PI
nvme: fix nvme_setup_command metadata trace event
nvme: fix refcounting imbalance when all paths are down
nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING
block: increase BLKCG_MAX_POLS
Pull i2c fixes from Wolfram Sang:
"Two bugfixes for the I2C subsystem"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mpc: Poll for MCF
misc: eeprom: at24: Always append device id even if label property is set.
Merge misc mm fixes from Andrew Morton:
"15 patches.
VM subsystems affected by this patch series: userfaultfd, kfence,
highmem, pagealloc, memblock, pagecache, secretmem, pagemap, and
hugetlbfs"
* akpm:
hugetlbfs: fix mount mode command line processing
mm: fix the deadlock in finish_fault()
mm: mmap_lock: fix disabling preemption directly
mm/secretmem: wire up ->set_page_dirty
writeback, cgroup: do not reparent dax inodes
writeback, cgroup: remove wb from offline list before releasing refcnt
memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction
mm: use kmap_local_page in memzero_page
mm: call flush_dcache_page() in memcpy_to_page() and memzero_page()
kfence: skip all GFP_ZONEMASK allocations
kfence: move the size check to the beginning of __kfence_alloc()
kfence: defer kfence_test_init to ensure that kunit debugfs is created
selftest: use mmap instead of posix_memalign to allocate memory
userfaultfd: do not untag user pointers
Clean up:
The size of 0 will be evaluated in the next step. Not
required here.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
Fixes: ca6eaaa210 ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Had a bug when converting bytes to bits when the cpu was rv32.
The a3 contains the number of bytes and multiple of 8
would be the bits. The LGREG is holding 2 for RV32 and 3 for
RV32, so to achieve multiple of 8 it must always be constant 3.
The 2 was mistakenly used for rv32.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
Fixes: ca6eaaa210 ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>