The expression inptr + 1 can technically be invalid: if inptr == inend,
inptr may point one element past the end of an array.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The clang default to warning for missing fall-through and it does
not support all comment-like annotation that gcc does. Use C23
[[fallthrough]] annotation instead.
proper attribute instead.
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
clang issues an warning adding '{unsigned} int' to a string does not
append to the string.
Use array indexes instead of string addition (it is simpler than
add a supress warning).
Reviewed-by: Sam James <sam@gentoo.org>
In several converters, a __GCONV_ILLEGAL_INPUT result gets overwritten
with __GCONV_FULL_OUTPUT. As a result, iconv (the function) returns
E2BIG instead of EILSEQ. The iconv program does not see the original
EILSEQ failure, does not recognize the invalid input, and may
incorrectly exit successfully.
To address this, a new __flags bit is used to indicate a sticky input
error state. All __GCONV_ILLEGAL_INPUT results are replaced with a
function call that sets this new __GCONV_ENCOUNTERED_ILLEGAL_INPUT and
returns __GCONV_ILLEGAL_INPUT. The iconv program checks for
__GCONV_ENCOUNTERED_ILLEGAL_INPUT and overrides the exit status.
The converter changes introducing __gconv_mark_illegal_input are
mostly mechanical, except for the res variable initialization in
iconvdata/iso-2022-jp.c: this error gets overwritten with __GCONV_OK
and other results in the following code. If res ==
__GCONV_ILLEGAL_INPUT afterwards, STANDARD_TO_LOOP_ERR_HANDLER below
will handle it.
The __gconv_mark_illegal_input changes do not alter the errno value
set by the iconv function. This is simpler to implement than
reviewing each __GCONV_FULL_OUTPUT result and adjust it not to
override a previous __GCONV_ILLEGAL_INPUT result. Doing it that way
would also change some E2BIG errors in to EILSEQ errors, so it had to
be done conditionally (under a flag set by the iconv program only), to
avoid confusing buffer management in other applications.
Reviewed-by: DJ Delorie <dj@redhat.com>
ISO-2022-CN-EXT uses escape sequences to indicate character set changes
(as specified by RFC 1922). While the SOdesignation has the expected
bounds checks, neither SS2designation nor SS3designation have its;
allowing a write overflow of 1, 2, or 3 bytes with fixed values:
'$+I', '$+J', '$+K', '$+L', '$+M', or '$*H'.
Checked on aarch64-linux-gnu.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
All the changes are in comments or '#error' messages.
Applying this commit results in bit-identical rebuild of iconvdata/*.so
Reviewed-by: Florian Weimer <fw@deneb.enyo.de>
Now that there is no need to use a special linker script to hardening
internal data structures, remove the --with-default-link configure
option and associated definitions.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
And use a packed structure instead. The compiler generates optimized
unaligned code if the architecture supports it.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Almost all uses of rawmemchr find the end of a string. Since most targets use
a generic implementation, replacing it with strchr is better since that is
optimized by compilers into strlen (s) + s. Also fix the generic rawmemchr
implementation to use a cast to unsigned char in the if statement.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This "Old POSIX/DKUUG borrowed format" handling is original to the file
and doesn't seem to have ever been used, i.e. id/t-t-c doesn't seem to
have ever been called with argv[1] == POSIX.
Upcoming is a POSIX charmap, which would inadvertently trigger this.
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This patch corrects the Big5-HKSCS converter to preserve the lowest 3 bits of
the mbstate_t __count data member when the converter encounters an incomplete
multibyte character.
This fixes BZ #25744.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Newer versions of GNU grep (after grep 3.7, not inclusive) will warn on
'egrep' and 'fgrep' invocations.
Convert usages within the tree to their expanded non-aliased counterparts
to avoid irritating warnings during ./configure and the test suite.
Signed-off-by: Sam James <sam@gentoo.org>
Reviewed-by: Fangrui Song <maskray@google.com>
On 32-bit machines this has no affect. On 64-bit machines
{u}int_fast{16|32} are set as {u}int64_t which is often not
ideal. Particularly x86_64 this change both saves code size and
may save instruction cost.
Full xcheck passes on x86_64.
UTF-7-IMAP differs from UTF-7 in the followings ways (see RFC 3501[1]
for reference) :
- The shift character is '&' instead of '+'
- There is no "optional direct characters" and the "direct characters"
set is different
- There is no implicit shift back to US-ASCII from BASE64, all BASE64
sequences MUST be terminated with '-'
[1]: https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3
Signed-off-by: Max Gautier <mg@max.gautier.name>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add infrastructure in utf-7.c to handle variants. The approach comes from
iso646.c
The variant is defined at gconv_init time and is passed as a
supplementary variable.
Signed-off-by: Max Gautier <mg@max.gautier.name>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
- Direct use of characters instead of arcane arrays
- isxbase64 is not the Modified BASE64 alphabet, but the characters who
needs to trigger an explicit shift back to US-ASCII. Make that clearer
Signed-off-by: Max Gautier <mg@max.gautier.name>
Reviewed-by: Adhemerval Zanellla <adhemerval.zanella@linaro.org>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
I (and maybe one or two others) added a (C) to the copyright notice
regardless of the contribution checklist[1] not mentioning it. Fix all
these instances so that the notice reads as "Copyright The GNU Toolchain
Authors" across the source code.
[1] https://sourceware.org/glibc/wiki/Contribution%20checklist
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Bugfix 27256 has introduced another issue:
In conversion from ISO-2022-JP-3 encoding, it is possible
to force iconv to emit extra NUL character on internal state reset.
To do this, it is sufficient to feed iconv with escape sequence
which switches active character set.
The simplified check 'data->__statep->__count != ASCII_set'
introduced by the aforementioned bugfix picks that case and
behaves as if '\0' character has been queued thus emitting it.
To eliminate this issue, these steps are taken:
* Restore original condition
'(data->__statep->__count & ~7) != ASCII_set'.
It is necessary since bits 0-2 may contain
number of buffered input characters.
* Check that queued character is not NUL.
Similar step is taken for main conversion loop.
Bundled test case follows following logic:
* Try to convert ISO-2022-JP-3 escape sequence
switching active character set
* Reset internal state by providing NULL as input buffer
* Ensure that nothing has been converted.
Signed-off-by: Nikita Popov <npv1310@gmail.com>
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Remove all malloc hook uses from core malloc functions and move it
into a new library libc_malloc_debug.so. With this, the hooks now no
longer have any effect on the core library.
libc_malloc_debug.so is a malloc interposer that needs to be preloaded
to get hooks functionality back so that the debugging features that
depend on the hooks, i.e. malloc-check, mcheck and mtrace work again.
Without the preloaded DSO these debugging features will be nops.
These features will be ported away from hooks in subsequent patches.
Similarly, legacy applications that need hooks functionality need to
preload libc_malloc_debug.so.
The symbols exported by libc_malloc_debug.so are maintained at exactly
the same version as libc.so.
Finally, static binaries will no longer be able to use malloc
debugging features since they cannot preload the debugging DSO.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reinstate gconv-modules as the main file so that the configuration
files in gconv-modules.d/ become add-on configuration. With this, the
effective user visible change is that GCONV_PATH can now have
supplementary configuration in GCONV_PATH/gconv-modules.d/ in addition
to the main GCONV_PATH/gconv-modules file.
Split module configuration so that only the bare minimum charsets,
i.e. ANSI_X3.110, ISO8859-15, ISO8859-1, CP1252, UNICODE, UTF-16,
UTF-32 and UTF-7 are configured in gconv-modules.conf. The remaining
module configurations are now in gconv-modules-extra.conf.
Reviewed-by: DJ Delorie <dj@redhat.com>
Move all gconv-modules configuration files to gconv-modules.conf.
That is, the S390 extensions now become gconv-modules-s390.conf. Move
both configuration files into gconv-modules.d.
Now GCONV_PATH/gconv-modules is read only for backward compatibility
for third-party gconv modules directories.
Reviewed-by: DJ Delorie <dj@redhat.com>
This commit removes the ELF constructor and internal variables from
dlfcn/dlfcn.c. The file now serves the same purpose as
nptl/libpthread-compat.c, so it is renamed to dlfcn/libdl-compat.c.
The use of libdl-shared-only-routines ensures that libdl.a is empty.
This commit adjusts the test suite not to use $(libdl). The libdl.so
symbolic link is no longer installed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This updates IBM256, IBM277, IBM278, IBM280, IBM284, IBM297, IBM424
in the same way that IBM273 was updated for bug 23290.
IBM256 and IBM424 still have holes after this change, so HAS_HOLES
is not updated.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The conversion loop to the internal encoding does not follow
the interface contract that __GCONV_FULL_OUTPUT is only returned
after the internal wchar_t buffer has been filled completely. This
is enforced by the first of the two asserts in iconv/skeleton.c:
/* We must run out of output buffer space in this
rerun. */
assert (outbuf == outerr);
assert (nstatus == __GCONV_FULL_OUTPUT);
This commit solves this issue by queuing a second wide character
which cannot be written immediately in the state variable, like
other converters already do (e.g., BIG5-HKSCS or TSCII).
Reported-by: Tavis Ormandy <taviso@gmail.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
The byte 0xfe as input to the EUC-KR conversion denotes a user-defined
area and is not allowed. The from_euc_kr function used to skip two bytes
when told to skip over the unknown designation, potentially running over
the buffer end.
The IBM1364, IBM1371, IBM1388, IBM1390 and IBM1399 character sets
share converter logic (iconvdata/ibm1364.c) which would reject
redundant shift sequences when processing input in these character
sets. This led to a hang in the iconv program (CVE-2020-27618).
This commit adjusts the converter to ignore redundant shift sequences
and adds test cases for iconv_prog hangs that would be triggered upon
their rejection. This brings the implementation in line with other
converters that also ignore redundant shift sequences (e.g. IBM930
etc., fixed in commit 692de4b396).
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
An input BIG5-HKSCS character may be converted into at most 2 wchar_t
characters. After outputting the second whcar_t character (which was
saved in the converter state) we must reset the state. If we fail
to reset the state we will be stuck continually copying that
character to the output even if we have further input to consider.
We add a new test case that covers the 4 BIG5-HKSCS characters
that may become 2 wchar_t characters.
Reviewed-by: Tom Honermann <tom@honermann.net>
In two places in glibc, -Wextra produces implicit-fallthrough warnings
where there are comments about the fall-through but their wording
doesn't match one of the forms expected by the default
implicit-fallthrough level. This patch adjusts those two places to
have a comment in a form that is accepted, so avoiding the warning
(this seems preferable to only being able to use a looser level of the
warning that allows any comment at all as evidence of deliberate
fall-through).
Tested for x86_64.
* iconvdata/cns11643.h (ucs4_to_cns11643): Adjust fall-through
comment wording.
* nis/nis_call.c (__do_niscall3): Likewise.