This variant builds upon the idea of the 2-block AVX2 variant that
shuffles words after each round. The shuffling has a rather high latency,
so the arithmetic units are not optimally used.
Given that we have plenty of registers in AVX, this version parallelizes
the 2-block variant to do four blocks. While the first two blocks are
shuffling, the CPU can do the XORing on the second two blocks and
vice-versa, which makes this version much faster than the SSSE3 variant
for four blocks. The latter is now mostly for systems that do not have
AVX2, but there it is the work-horse, so we keep it in place.
The partial XORing function trailer is very similar to the AVX2 2-block
variant. While it could be shared, that code segment is rather short;
profiling is also easier with the trailer integrated, so we keep it per
function.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This variant uses the same principle as the single block SSSE3 variant
by shuffling the state matrix after each round. With the wider AVX
registers, we can do two blocks in parallel, though.
This function can increase performance and efficiency significantly for
lengths that would otherwise require a 4-block function.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that all block functions support partial lengths, engage the wider
block sizes more aggressively. This prevents using smaller block
functions multiple times, where the next larger block function would
have been faster.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a length argument to the eight block function for AVX2, so the
block function may XOR only a partial length of eight blocks.
To avoid unnecessary operations, we integrate XORing of the first four
blocks in the final lane interleaving; this also avoids some work in
the partial lengths path.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a length argument to the quad block function for SSSE3, so the
block function may XOR only a partial length of four blocks.
As we already have the stack set up, the partial XORing does not need
to. This gives a slightly different function trailer, so we keep that
separate from the 1-block function.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a length argument to the single block function for SSSE3, so the
block function may XOR only a partial length of the full block. Given
that the setup code is rather cheap, the function does not process more
than one block; this allows us to keep the block function selection in
the C glue code.
The required branching does not negatively affect performance for full
block sizes. The partial XORing uses simple "rep movsb" to copy the
data before and after doing XOR in SSE. This is rather efficient on
modern processors; movsw can be slightly faster, but the additional
complexity is probably not worth it.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for Chacha20 + Poly1305 combined AEAD:
-generic (rfc7539)
-IPsec (rfc7634 - known as rfc7539esp in the kernel)
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for Chacha20 + Poly1305 combined AEAD:
-generic (rfc7539)
-IPsec (rfc7634 - known as rfc7539esp in the kernel)
Signed-off-by: Cristian Stoica <cristian.stoica@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Era 10 changes the register map.
The updates that affect the drivers:
-new version registers are added
-DBG_DBG[deco_state] field is moved to a new register -
DBG_EXEC[19:16] @ 8_0E3Ch.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
On 6ull and 6sll the DCP block has a clock which needs to be explicitly
enabled.
Add minimal handling for this at probe/remove time.
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently used scalar multiplication algorithm (Matthieu Rivain, 2011)
have invalid values for scalar == 1, n-1, and for regularized version
n-2, which was previously not checked. Verify that they are not used as
private keys.
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There have been a pretty ridiculous number of issues with initializing
the report structures that are copied to userspace by NETLINK_CRYPTO.
Commit 4473710df1 ("crypto: user - Prepare for CRYPTO_MAX_ALG_NAME
expansion") replaced some strncpy()s with strlcpy()s, thereby
introducing information leaks. Later two other people tried to replace
other strncpy()s with strlcpy() too, which would have introduced even
more information leaks:
- https://lore.kernel.org/patchwork/patch/954991/
- https://patchwork.kernel.org/patch/10434351/
Commit cac5818c25 ("crypto: user - Implement a generic crypto
statistics") also uses the buggy strlcpy() approach and therefore leaks
uninitialized memory to userspace. A fix was proposed, but it was
originally incomplete.
Seeing as how apparently no one can get this right with the current
approach, change all the reporting functions to:
- Start by memsetting the report structure to 0. This guarantees it's
always initialized, regardless of what happens later.
- Initialize all strings using strscpy(). This is safe after the
memset, ensures null termination of long strings, avoids unnecessary
work, and avoids the -Wstringop-truncation warnings from gcc.
- Use sizeof(var) instead of sizeof(type). This is more robust against
copy+paste errors.
For simplicity, also reuse the -EMSGSIZE return value from nla_put().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The acomp, akcipher, and kpp algorithm types already have .report
methods defined, so there's no need to duplicate this functionality in
crypto_user itself; the duplicate functions are actually never executed.
Remove the unused code.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Passing string 'name' as the format specifier is potentially hazardous
because name could (although very unlikely to) have a format specifier
embedded in it causing issues when parsing the non-existent arguments
to these. Follow best practice by using the "%s" format string for
the string 'name'.
Cleans up clang warning:
crypto/pcrypt.c:397:40: warning: format string is not a string literal
(potentially insecure) [-Wformat-security]
Fixes: a3fb1e330d ("pcrypt: Added sysfs interface to pcrypt")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream with
IV, rather than with data stream, resulting in incorrect decryption.
Test vectors will be added in the next patch.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In crypto_alloc_context(), a DMA pool is allocated through dma_pool_alloc()
to hold the crypto context. The meta data of the DMA pool, including the
pool used for the allocation 'ndev->ctx_pool' and the base address of the
DMA pool used by the device 'dma', are then stored to the beginning of the
pool. These meta data are eventually used in crypto_free_context() to free
the DMA pool through dma_pool_free(). However, given that the DMA pool can
also be accessed by the device, a malicious device can modify these meta
data, especially when the device is controlled to deploy an attack. This
can cause an unexpected DMA pool free failure.
To avoid the above issue, this patch introduces a new structure
crypto_ctx_hdr and a new field chdr in the structure nitrox_crypto_ctx hold
the meta data information of the DMA pool after the allocation. Note that
the original structure ctx_hdr is not changed to ensure the compatibility.
Cc: <stable@vger.kernel.org>
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
encapsulate set_cipher_mode call with another api,
preparation for specific hash behavior as needed in later patches
when SM3 introduced.
Signed-off-by: Yael Chemla <yael.chemla@foss.arm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Adjust hash length such that it will not be fixed and general for all algs.
Instead make it suitable for certain context information.
This is preparation for SM3 support.
Signed-off-by: Yael Chemla <yael.chemla@foss.arm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for Arm TrustZone CryptoCell 713.
Note that this patch just enables using a 713 in backwards compatible mode
to 712. Newer 713 specific features will follow.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Make the ARM scalar AES implementation closer to constant-time by
disabling interrupts and prefetching the tables into L1 cache. This is
feasible because due to ARM's "free" rotations, the main tables are only
1024 bytes instead of the usual 4096 used by most AES implementations.
On ARM Cortex-A7, the speed loss is only about 5%. The resulting code
is still over twice as fast as aes_ti.c. Responsiveness is potentially
a concern, but interrupts are only disabled for a single AES block.
Note that even after these changes, the implementation still isn't
necessarily guaranteed to be constant-time; see
https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
of the many difficulties involved in writing truly constant-time AES
software. But it's valuable to make such attacks more difficult.
Much of this patch is based on patches suggested by Ard Biesheuvel.
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In the "aes-fixed-time" AES implementation, disable interrupts while
accessing the S-box, in order to make cache-timing attacks more
difficult. Previously it was possible for the CPU to be interrupted
while the S-box was loaded into L1 cache, potentially evicting the
cachelines and causing later table lookups to be time-variant.
In tests I did on x86 and ARM, this doesn't affect performance
significantly. Responsiveness is potentially a concern, but interrupts
are only disabled for a single AES block.
Note that even after this change, the implementation still isn't
necessarily guaranteed to be constant-time; see
https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
of the many difficulties involved in writing truly constant-time AES
software. But it's valuable to make such attacks more difficult.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For preventing uninitialized data to be given to user-space (and so leak
potential useful data), the crypto_stat structure must be correctly
initialized.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: cac5818c25 ("crypto: user - Implement a generic crypto statistics")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
[EB: also fix it in crypto_reportstat_one()]
[EB: use sizeof(var) rather than sizeof(type)]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
All bytes of the NETLINK_CRYPTO report structures must be initialized,
since they are copied to userspace. The change from strncpy() to
strlcpy() broke this. As a minimal fix, change it back.
Fixes: 4473710df1 ("crypto: user - Prepare for CRYPTO_MAX_ALG_NAME expansion")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The simd wrapper's skcipher request context structure consists
of a single subrequest whose size is taken from the subordinate
skcipher. However, in simd_skcipher_init(), the reqsize that is
retrieved is not from the subordinate skcipher but from the
cryptd request structure, whose size is completely unrelated to
the actual wrapped skcipher.
Reported-by: Qian Cai <cai@gmx.us>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Qian Cai <cai@gmx.us>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
coccicheck currently warns of the following issues in the driver:
drivers/crypto/hisilicon/sec/sec_algs.c:864:51-66: ERROR: reference preceded by free on line 812
drivers/crypto/hisilicon/sec/sec_algs.c:864:40-49: ERROR: reference preceded by free on line 813
drivers/crypto/hisilicon/sec/sec_algs.c:861:8-24: ERROR: reference preceded by free on line 814
drivers/crypto/hisilicon/sec/sec_algs.c:860:41-51: ERROR: reference preceded by free on line 815
drivers/crypto/hisilicon/sec/sec_algs.c:867:7-18: ERROR: reference preceded by free on line 816
It would appear than on certain error paths that we may attempt reference-
after-free some memories.
This patch fixes those issues. The solution doesn't look perfect, but
having same memories free'd possibly from separate functions makes it
tricky.
Fixes: 915e4e8413 ("crypto: hisilicon - SEC security accelerator driver")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull UBIFS updates from Richard Weinberger:
- Full filesystem authentication feature, UBIFS is now able to have the
whole filesystem structure authenticated plus user data encrypted and
authenticated.
- Minor cleanups
* tag 'tags/upstream-4.20-rc1' of git://git.infradead.org/linux-ubifs: (26 commits)
ubifs: Remove unneeded semicolon
Documentation: ubifs: Add authentication whitepaper
ubifs: Enable authentication support
ubifs: Do not update inode size in-place in authenticated mode
ubifs: Add hashes and HMACs to default filesystem
ubifs: authentication: Authenticate super block node
ubifs: Create hash for default LPT
ubfis: authentication: Authenticate master node
ubifs: authentication: Authenticate LPT
ubifs: Authenticate replayed journal
ubifs: Add auth nodes to garbage collector journal head
ubifs: Add authentication nodes to journal
ubifs: authentication: Add hashes to index nodes
ubifs: Add hashes to the tree node cache
ubifs: Create functions to embed a HMAC in a node
ubifs: Add helper functions for authentication support
ubifs: Add separate functions to init/crc a node
ubifs: Format changes for authentication support
ubifs: Store read superblock node
ubifs: Drop write_node
...
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Bugfix:
- Fix build issues on architectures that don't provide 64-bit cmpxchg
Cleanups:
- Fix a spelling mistake"
* tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: fix spelling mistake, EACCESS -> EACCES
SUNRPC: Use atomic(64)_t for seq_send(64)
Pull more timer updates from Thomas Gleixner:
"A set of commits for the new C-SKY architecture timers"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dt-bindings: timer: gx6605s SOC timer
clocksource/drivers/c-sky: Add gx6605s SOC system timer
dt-bindings: timer: C-SKY Multi-processor timer
clocksource/drivers/c-sky: Add C-SKY SMP timer
Pull NTB updates from Jon Mason:
"Fairly minor changes and bug fixes:
NTB IDT thermal changes and hook into hwmon, ntb_netdev clean-up of
private struct, and a few bug fixes"
* tag 'ntb-4.20' of git://github.com/jonmason/ntb:
ntb: idt: Alter the driver info comments
ntb: idt: Discard temperature sensor IRQ handler
ntb: idt: Add basic hwmon sysfs interface
ntb: idt: Alter temperature read method
ntb_netdev: Simplify remove with client device drvdata
NTB: transport: Try harder to alloc an aligned MW buffer
ntb: ntb_transport: Mark expected switch fall-throughs
ntb: idt: Set PCIe bus address to BARLIMITx
NTB: ntb_hw_idt: replace IS_ERR_OR_NULL with regular NULL checks
ntb: intel: fix return value for ndev_vec_mask()
ntb_netdev: fix sleep time mismatch
Pull scheduler fixes from Ingo Molnar:
"A memory (under-)allocation fix and a comment fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/topology: Fix off by one bug
sched/rt: Update comment in pick_next_task_rt()
Pull x86 fixes from Ingo Molnar:
"A number of fixes and some late updates:
- make in_compat_syscall() behavior on x86-32 similar to other
platforms, this touches a number of generic files but is not
intended to impact non-x86 platforms.
- objtool fixes
- PAT preemption fix
- paravirt fixes/cleanups
- cpufeatures updates for new instructions
- earlyprintk quirk
- make microcode version in sysfs world-readable (it is already
world-readable in procfs)
- minor cleanups and fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
compat: Cleanup in_compat_syscall() callers
x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
objtool: Support GCC 9 cold subfunction naming scheme
x86/numa_emulation: Fix uniform-split numa emulation
x86/paravirt: Remove unused _paravirt_ident_32
x86/mm/pat: Disable preemption around __flush_tlb_all()
x86/paravirt: Remove GPL from pv_ops export
x86/traps: Use format string with panic() call
x86: Clean up 'sizeof x' => 'sizeof(x)'
x86/cpufeatures: Enumerate MOVDIR64B instruction
x86/cpufeatures: Enumerate MOVDIRI instruction
x86/earlyprintk: Add a force option for pciserial device
objtool: Support per-function rodata sections
x86/microcode: Make revision and processor flags world-readable