The kernel already knows at the time of interrupt allocation whether
affinity of an interrupt can be controlled by userspace or not.
It still creates all related procfs control files with read/write
permissions. That's inconsistent and non-intuitive for system
administrators and tools.
Therefore set the file permissions to read-only for such interrupts.
[ tglx: Massage change log, fixed UP build ]
Signed-off-by: Jeff Xie <jeff.xie@linux.dev>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240825131911.107119-1-jeff.xie@linux.dev
Kunkun Jiang reports that for a workload involving the simultaneous startup
of a large number of VMs (for a total of about 200 vcpus), a lot of CPU
time gets spent on spinning on the tmp_mask_lock that exists as a static
raw spinlock in irq_do_set_affinity(). This lock protects a global cpumask
(tmp_mask) that is used as a temporary variable to compute the resulting
affinity.
While this is triggered by KVM issuing a irq_set_affinity() call each time
a vcpu is about to execute, it is obvious that having a single global
resource is not very scalable.
Since a cpumask can be a fairly large structure on systems with a high core
count, a stack allocation is not really appropriate. Instead, turn the
global cpumask into a per-CPU variable, removing the need for locking
altogether as the code is executed with preemption and interrupts disabled.
[ tglx: Moved the per CPU variable declaration outside of the function ]
Reported-by: Kunkun Jiang <jiangkunkun@huawei.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kunkun Jiang <jiangkunkun@huawei.com>
Link: https://lore.kernel.org/all/20240826080618.3886694-1-maz@kernel.org
Link: https://lore.kernel.org/all/a7fc58e4-64c2-77fc-c1dc-f5eb78dbbb01@huawei.com
When soft interrupt actions are called, they are passed a pointer to the
struct softirq action which contains the action's function pointer.
This pointer isn't useful, as the action callback already knows what
function it is. And since each callback handles a specific soft interrupt,
the callback also knows which soft interrupt number is running.
No soft interrupt action callback actually uses this parameter, so remove
it from the function pointer signature. This clarifies that soft interrupt
actions are global routines and makes it slightly cheaper to call them.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/all/20240815171549.3260003-1-csander@purestorage.com
The unification of irq_domain_create_legacy() missed the fact that
interrupts must be associated even when the Linux interrupt number provided
in the first_irq argument is 0.
This breaks all call sites of irq_domain_create_legacy() which supply 0 as
the first_irq argument.
Enforce the association for legacy domains in __irq_domain_instantiate() to
cure this.
[ tglx: Massaged it slightly. ]
Fixes: 70114e7f75 ("irqdomain: Simplify simple and legacy domain creation")
Reported-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by Matti Vaittinen <mazziesaccount@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Link: https://lore.kernel.org/all/c3379142-10bc-4f14-b8ac-a46927aeac38@gmail.com
Devices can provide multiple interrupt lines. One reason for this is that
a device has multiple subfunctions, each providing its own interrupt line.
Another reason is that a device can be designed to be used (also) on a
system where some of the interrupts can be routed to another processor.
A line often further acts as a demultiplex for specific interrupts
and has it's respective set of interrupt (status, mask, ack, ...)
registers.
Regmap supports the handling of these registers and demultiplexing
interrupts, but the interrupt domain code ends up assigning the same name
for the per interrupt line domains. This causes a naming collision in the
debugFS code and leads to confusion, as /proc/interrupts shows two separate
interrupts with the same domain name and hardware interrupt number.
Instead of adding a workaround in regmap or driver code, allow giving a
name suffix for the domain name when the domain is created.
Add a name_suffix field in the irq_domain_info structure and make
irq_domain_instantiate() use this suffix if it is given when a domain is
created.
[ tglx: Adopt it to the cleanup patch and fixup the invalid NULL return ]
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/871q2yvk5x.ffs@tglx
irq_domain_create_simple() and irq_domain_create_legacy() use
__irq_domain_instantiate(), but have extra handling of allocating interrupt
descriptors and associating interrupts in them. Some of that is duplicated.
There are also call sites which have conditonals to invoke different
interrupt domain creator functions, where one of them is usually
irq_domain_create_legacy(). Alternatively they associate the interrupts for
the legacy case after creating the domain.
Moving the extra logic of irq_domain_create_simple()/legacy() into
__irq_domain_instantiate() allows to consolidate that.
Introduce hwirq_base and virq_base members in the irq_domain_info
structure, which allows to transport the required information and add the
conditional interrupt descriptor allocation and interrupt association into
__irq_domain_instantiate().
This reduces irq_domain_create_legacy() and irq_domain_create_simple() to
trivial wrappers which fill in the info structure and allows call sites
which must support the legacy case along with more modern mechanism to
select the domain type via the parameters of the info struct.
[ tglx: Massaged change log ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/32d07bd79eb2b5416e24da9e9e8fe5955423dcf9.1723120028.git.mazziesaccount@gmail.com
mpic_of_init() contains the last case where the open coded IPI support
condition needs to be replaced with mpic_is_ipi_available() to keep the
code consistent.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On platforms where MPIC is not the top-level interrupt controller the
driver currently only supports handling of the per-CPU interrupts (the
first 29 interrupts). This is obvious from the code of
mpic_handle_cascade_irq(), which reads only one cause register.
Bound the number of available interrupts in the interrupt domain to 29 for
these platforms.
The corresponding device-trees refer only to per-CPU interrupts via MPIC,
the other interrupts are referred to via GIC.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use MPIC_PER_CPU_IRQS_NR (29) bound instead of BITS_PER_LONG (32) when
iterating the bits of the per-CPU interrupt cause register, since there
are only 29 per-CPU interrupts. The top 3 bits are always zero anyway.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The number of per-CPU interrupts is 29 (0 to 28). This is described by
the constant MPIC_MAX_PER_CPU_IRQS, set to 28 (the maximum per-CPU
interrupt).
Commit 0fa4ce746d ("irqchip/armada-370-xp: Re-enable per-CPU
interrupts at resume time") used the constant incorrectly in the
for-loop, it used the operator < instead of <=, causing it to iterate
only the first 28 interrupts (0 to 27), ignoring the last, 28th,
per-CPU interrupt.
To avoid this kind of confusions, fix this issue by renaming the constant
to MPIC_PER_CPU_IRQS_NR and set it to 29, the number of per-CPU IRQs.
Update its use in mpic_is_percpu_irq() accordingly.
Fixes: 0fa4ce746d ("irqchip/armada-370-xp: Re-enable per-CPU interrupts at resume time")
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable+noautosel@kernel.org> # The 29th interrupt is not used in any device-tree
Dynamically allocate the driver private structure. This concludes the
conversion of this driver to modern style.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In continuation of converting the driver to modern style, drop the
global pointer to the driver private structure and instead pass it
around the functions and callbacks, wherever possible. (There are 3
cases where it is not possible: mpic_cascaded_starting_cpu() and the
syscore operations mpic_suspend() and mpic_resume()).
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Put the MSI doorbell limits msi_doorbell_start, msi_doorbell_size and
msi_doorbell_mask into the driver private structure and get rid of the
corresponding functions.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In preparation for converting the driver to modern style put all the
interrupt controller private static variables into driver private
structure.
Access to these variables changes as:
main_int_base mpic->base
per_cpu_int_base mpic->per_cpu
mpic_domain mpic->domain
parent_irq mpic->parent_irq
...
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For consistency with the rest of the driver, put the __init attribute
after the return type of the mpic_ipi_init() function.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add the __init attribute to the mpic_msi_init() function. It is only
called from the device initializer, and so can be dropped after boot is
complete.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Drop the msi_doorbell_end() function and related constants, it is not
used anymore.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Drop IPI_DOORBELL_START since it is not used and rename IPI_DOORBELL_END
to IPI_DOORBELL_NR.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For consistency across the driver, use the u32 type instead of unsigned
long for holding register values and return value of cpu_logical_map().
One exception is when the variable is referenced for passing into
for_each_set_bit(), in which case it has to be unsigned long.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240711160907.31012-9-kabel@kernel.org