The issue is that vulkan_device_create_internal() is only called for
devices that lavu creates by itself.
For external devices, this was never done.
This also solves some mid-function declaration warnings.
This feature fundamentally relies on host-visible VRAM, which restricts the
set of available memory types to (typically) host-visible device-local ones.
When resizable BAR is disabled, this memory type is usually limited to
e.g. 256 MiB in size, which is just plain insufficient for allocation of
general purpose GPU images, causing OOM errors on even the simplest of
commands.
The easiest solution is to disable host transfers entirely on machines
without host-addressable VRAM. In theory, we could try and recover the use
of host transfers for images which are *not* restricted to device-local
memory types, but this is rarely the case in practice, and the effort
required would exceed the benefit, especially since ReBAR is a standard
feature on all platforms recent enough to have Vulkan drivers, and only
occasionally disabled in the UEFI for by default for some hare-brained
notion of "backwards compatibiility" with ancient software.
This fixes building with Clang in MSVC mode, for x86, which was
broken in 6e49b86996 (in Nov 2024);
previously it failed with undefined symbols for the constants
defined with DECLARE_ASM_CONST, accessed via inline assembly.
Before 57861911a3, there was an
#elif defined(__GNUC__) || defined(__clang__)
case before the
#elif defined(_MSC_VER)
case for defining DECLARE_ASM_CONST, which included av_used.
(This case included the explicit "defined(__clang__)" since
f637046d3134a331e4b5a7243ac3dfb92735b8a5.)
After 57861911a3, it used the
generic definition of DECLARE_ASM_CONST that also included
av_used - which also worked for Clang in MSVC mode. But after
6e49b86996, Clang in MSVC mode
ended up using the MSVC specific variant which lacked the
av_used declaration, causing linker errors due to undefined
symbols.
Signed-off-by: Martin Storsjö <martin@martin.st>
Fix incorrect enum value used in color primaries check by replacing
AVCOL_SPC_UNSPECIFIED with AVCOL_PRI_UNSPECIFIED.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Fixes use of bultins on clang x86_64-pc-windows-msvc which does not
define any __GNUC__. Also on other targets __GNUC__ is defined to 4 by
default, so any feature testing based on version is not really valid.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
On NVIDIA, there's a global maximum limit of approximately 112 queues,
which means it takes ONLY 7 total programs using the maximum amount of
queues to cause the driver to error out/*segfault* during initialization.
Also, each queue takes about 30ms to allocate, which quickly adds up.
This reduces the queues allocate to the minimum that we would be happy
with. Its not worth limiting decode/encode queues as they're generally
not a lot, and do help.
GCC/Clang is smart enough to emit minss/maxss the same way as these functions.
The only theoretical benefit was in x86_32, where x87 floats are used, but the
penalty of making the clipping opaque to the compiler's scheduler plus moving
values from mmx regs to xmm and back will offset any potential speedup.
x86_32 builds targetting anything made in the last two decades and a half
should use -msse -mfp=sse anyway.
Signed-off-by: James Almer <jamrial@gmail.com>
Just do it like av_frame_replace().
Also "fixes" the swapped order or arguments to av_malloc_array()
(the first is supposed to be the number of elements, the second
the size of an element) and therefore part of ticket #11620.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(I don't know whether this can be triggered for a file with
nonnegative channel count, given that src's extended data can't
have been allocated in this case.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Useful to let the compiler and static analyzers know that
something is unreachable without adding an av_assert
(which would be either dead for the compiler or add runtime
overhead) for this.
The implementation used here enforces the use of a message
to provide a reason why a particular code is supposed to be
unreachable.
Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Use 16-byte alignment (align=4) instead of 4-byte (align=2) in the function and
const macros. This improves instruction fetch and NEON load performance on
modern AArch64 CPUs.
In the call to vkGetPhysicalDeviceImageFormatProperties2(), we were
previously requesting the properties of the first fallback format (e.g.
VK_FORMAT_R8_UNORM for VK_FORMAT_G8_B8R8_2PLANE_420_UNORM) instead of
the actual format in use.
We don’t do anything with it afterwards, but there is no reason to keep
querying the wrong format.
Vulkan's main issue around using BGR is simple.
The letters in the shader don't match up (rgba in shader, bgra in format).
So of course, rather than allowing "bgra" or other permutations of
formats in the shader, they went the nuclear option and spent months writing
an extension to get rid of the need to have a format in the shader to begin
with.
All this to solve a problem that should never have existed to begin with.
This fixes BGRA images since enabling WithoutFormat, as the GPU now remaps
without your involvement.
This implements support for reading and writing storage images with
no format.
The issue is that we define our images as arrays, and arrays can
only have a single type, which means that f.ex. NV12 needs two
different images, R8 and RG8.
The only driver known not to advertise support for the extension
as a whole is Intel, because they have parial support for odd formats
we never use. Therefore, just always enable it by default.
We violated the spec, which, despite the actual command buffer pool
*not* being involved in any functions which require external synchronization
of the pool, *require* external synchronization even if only the
command buffers are used.
This also has the effect of *significantly* speeding up execution
in case command buffers are contended.