View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0015724||CentOS-6||compat-glibc||public||2019-01-21 03:44||2019-01-21 03:44|
|Target Version||Fixed in Version|
|Summary||0015724: Standard lib C functions to handle multi-byte/widechar character sets (setlocale(), mblen(), wcwidth())|
|Description||We have a C program using the standard lib C functions to handle multi-byte/widechar character sets (setlocale(), mblen(), wcwidth()).|
We use the following locale:
This is BIG5, where Asian characters are encoded on 2 bytes.
We want to handle User-Defined Characters in range 0xFA40-0xFA49 (full range is 0xF9D6-0xFEFE )
However, it appears that the mblen() function returns -1 for such characters.
Is this normal/expected?
This is mission-critical in our application.
How can we enable the support of such characters?
Is it a bug with zh_TW.big5?
Note: When trying with zh_HK.big5hkscs, mblen() returns 2 for 0xFA40, but we assume this is expected, since this code is used by the HKSCS extension to BIG5.
But using BIG5HKSCS is not an option, we want to use zh_TW.big5.
How can User-Defined Characters be supported in this case?
|Steps To Reproduce||To reproduce:|
$ gcc -o wcwidth-big5.bin wcwidth-big5.c
>> byte pos: 000 bytes: 2 width: 2 remaining: [Ａ壬Ｂ @]
>> byte pos: 002 bytes: 2 width: 2 remaining: [壬Ｂ @]
>> byte pos: 004 bytes: 2 width: 2 remaining: [Ｂ @]
>> error: invalid char at 6 mblen()=-1
Looking at the source code, you can see that we fill a BIG5 string with some byte sequences that represent regular BIG5 chars and the
|Tags||No tags attached.|