View Issue Details

IDProjectCategoryView StatusLast Update
0018317CentOS-8glibcpublic2022-02-03 22:05
Reportersoko246 Assigned To 
PriorityurgentSeveritymajorReproducibilityalways
Status closedResolutionwon't fix 
Product Version8.4.2105 
Summary0018317: iconv silently corrupts data
DescriptionUsing iconv for code page conversion, results in corrupted output, when "-c" flag (discard characters that cannot be converted) is used for input where characters that *can* and *cannot* be converted appear together,
The issue only manifests for rather large inputs (presumably > 32K).
There is no error or warning, results are just broken.
Steps To ReproduceOpen bash and run:
>export LANG=C
>perl -E 'say "\x58\xe2\x58\xc3\x92\x58\xe2\x58\x58\xe2\x58\xc3\x92\x58\xe2\x58\n" x 15000' | iconv -c -f ISO-8859-3 -t UTF-8 | sort | uniq -c

It creates 15000 lines of mixed "X", ISO-8859-3-convertable \xe2 and ISO-8859-3-unconvertable \xc3\x92, which is fed into iconv for convertion to UTF8.
I expect \xe2 to be converted, \xc3\x92 to be dropped (because of "-c") and in any case, all lines to be equal.
Something like this:
>15000 XâX�XâXXâX�XâX

However I get *a mix* of broken lines.
I.e. the actual output is:
> 1
> 2 XXâX�XâX
> 2 XâX�XXâX
> 2 XâX�XâX
> 1 XâX�XâXX
> 2 XâX�XâXXâX�X�XâXXâX�XâX
> 14917 XâX�XâXXâX�XâX

As can be seen, many lines just disappear (14917+2+1+2+2+2+1 don't sum up to 15000).
Additional InformationActual specific input does not matter, as long as it has a mix of convertable and non-convertable characters.
Reducing number of input lines to smaller number (ex. 1000) and all works as expected:
>1000 XâX�XâXXâX�XâX

I tried this for ISO-8859-3 and ISO-8859-8 (same input) with similar (wrong) results.
Results are broken in latest CentOS8.4, RHEL8.4, as well as CentOS6.10

Using piconv (Perl variant of iconv) instead of iconv produces correct results.
Tagscodepage, glibc, iconv

Activities

toracat

toracat

2022-01-09 02:03

manager   ~0038805

CentOS Linux 8 ended its life on December 31, 2021 and, therefore, is no longer supported.

Issue History

Date Modified Username Field Change
2021-09-30 17:35 soko246 New Issue
2021-09-30 17:35 soko246 Tag Attached: codepage
2021-09-30 17:35 soko246 Tag Attached: glibc
2021-09-30 17:35 soko246 Tag Attached: iconv
2022-01-09 02:03 toracat Note Added: 0038805
2022-02-03 22:05 toracat Status new => closed
2022-02-03 22:05 toracat Resolution open => won't fix