View Issue Details

IDProjectCategoryView StatusLast Update
0015724CentOS-6compat-glibcpublic2019-01-21 03:44
ReporterBarry 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
Product Version6.7 
Target VersionFixed in Version 
Summary0015724: Standard lib C functions to handle multi-byte/widechar character sets (setlocale(), mblen(), wcwidth())
DescriptionWe have a C program using the standard lib C functions to handle multi-byte/widechar character sets (setlocale(), mblen(), wcwidth()).

We use the following locale:

    LC_ALL=zh_TW.big5

This is BIG5, where Asian characters are encoded on 2 bytes.

We want to handle User-Defined Characters in range 0xFA40-0xFA49 (full range is 0xF9D6-0xFEFE )

However, it appears that the mblen() function returns -1 for such characters.

Is this normal/expected?

This is mission-critical in our application.

How can we enable the support of such characters?

Is it a bug with zh_TW.big5?

Note: When trying with zh_HK.big5hkscs, mblen() returns 2 for 0xFA40, but we assume this is expected, since this code is used by the HKSCS extension to BIG5.

But using BIG5HKSCS is not an option, we want to use zh_TW.big5.

How can User-Defined Characters be supported in this case?
Steps To ReproduceTo reproduce:

$ gcc -o wcwidth-big5.bin wcwidth-big5.c

$ ./wcwidth-big5.bin
 >> byte pos: 000 bytes: 2 width: 2 remaining: [A壬B @]
 >> byte pos: 002 bytes: 2 width: 2 remaining: [壬B @]
 >> byte pos: 004 bytes: 2 width: 2 remaining: [B @]
 >> error: invalid char at 6 mblen()=-1


Looking at the source code, you can see that we fill a BIG5 string with some byte sequences that represent regular BIG5 chars and the
0xFA40 UDC.
TagsNo tags attached.

Activities

Barry

Barry

2019-01-21 03:44

reporter  

wcwidth-big5.c (1,303 bytes)

Issue History

Date Modified Username Field Change
2019-01-21 03:44 Barry New Issue
2019-01-21 03:44 Barry File Added: wcwidth-big5.c