View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0018317 | CentOS-8 | glibc | public | 2021-09-30 17:35 | 2022-02-03 22:05 |
Reporter | soko246 | Assigned To | |||
Priority | urgent | Severity | major | Reproducibility | always |
Status | closed | Resolution | won't fix | ||
Product Version | 8.4.2105 | ||||
Summary | 0018317: iconv silently corrupts data | ||||
Description | Using iconv for code page conversion, results in corrupted output, when "-c" flag (discard characters that cannot be converted) is used for input where characters that *can* and *cannot* be converted appear together, The issue only manifests for rather large inputs (presumably > 32K). There is no error or warning, results are just broken. | ||||
Steps To Reproduce | Open bash and run: >export LANG=C >perl -E 'say "\x58\xe2\x58\xc3\x92\x58\xe2\x58\x58\xe2\x58\xc3\x92\x58\xe2\x58\n" x 15000' | iconv -c -f ISO-8859-3 -t UTF-8 | sort | uniq -c It creates 15000 lines of mixed "X", ISO-8859-3-convertable \xe2 and ISO-8859-3-unconvertable \xc3\x92, which is fed into iconv for convertion to UTF8. I expect \xe2 to be converted, \xc3\x92 to be dropped (because of "-c") and in any case, all lines to be equal. Something like this: >15000 XâX�XâXXâX�XâX However I get *a mix* of broken lines. I.e. the actual output is: > 1 > 2 XXâX�XâX > 2 XâX�XXâX > 2 XâX�XâX > 1 XâX�XâXX > 2 XâX�XâXXâX�X�XâXXâX�XâX > 14917 XâX�XâXXâX�XâX As can be seen, many lines just disappear (14917+2+1+2+2+2+1 don't sum up to 15000). | ||||
Additional Information | Actual specific input does not matter, as long as it has a mix of convertable and non-convertable characters. Reducing number of input lines to smaller number (ex. 1000) and all works as expected: >1000 XâX�XâXXâX�XâX I tried this for ISO-8859-3 and ISO-8859-8 (same input) with similar (wrong) results. Results are broken in latest CentOS8.4, RHEL8.4, as well as CentOS6.10 Using piconv (Perl variant of iconv) instead of iconv produces correct results. | ||||
Tags | codepage, glibc, iconv | ||||
Date Modified | Username | Field | Change |
---|---|---|---|
2021-09-30 17:35 | soko246 | New Issue | |
2021-09-30 17:35 | soko246 | Tag Attached: codepage | |
2021-09-30 17:35 | soko246 | Tag Attached: glibc | |
2021-09-30 17:35 | soko246 | Tag Attached: iconv | |
2022-01-09 02:03 | toracat | Note Added: 0038805 | |
2022-02-03 22:05 | toracat | Status | new => closed |
2022-02-03 22:05 | toracat | Resolution | open => won't fix |