U+034F Combining Grapheme Joiner
U+034F was added in Unicode version 3.2 in 2002. It belongs to the block
This character is a Nonspacing Mark and inherits its
The glyph is not a composition. Its width in East Asian texts is determined by its context. It can be displayed wide or narrow. In bidirectional text it acts as Nonspacing Mark. When changing direction it is not mirrored. U+034F prohibits a line break around it.
The Wikipedia has the following information about this codepoint:
The combining grapheme joiner (CGJ), U+034F ͏ COMBINING GRAPHEME JOINER is a Unicode character that has no visible glyph and is "default ignorable" by applications. Its name is a misnomer and does not describe its function: the character does not join graphemes. Its purpose is to semantically separate characters that should not be considered digraphs as well as to block canonical reordering of combining marks during normalization.
For example, in a Hungarian language context, adjoining letters c and s would normally be considered equivalent to the cs digraph. If they are separated by the CGJ, they will be considered as two separate graphemes. However, in contrast to the zero-width joiner and similar characters, the CGJ does not affect whether the two letters are rendered separately or as a ligature or cursively joined—the default behavior for this is determined by the font.
The CGJ is also needed for complex scripts. For example, in most cases the Hebrew cantillation accent metheg is supposed to appear to the left of the vowel point and by default most display systems will render it like this even if it is typed before the vowel. But in some words in Biblical Hebrew the metheg appears to the right of the vowel, and to tell the display engine to render it properly on the right, CGJ must be typed between the metheg and the vowel. Compare:
In the case of several consecutive combining diacritics, an intervening CGJ indicates that they should not be subject to canonical reordering.
In contrast, the "zero-width non-joiner" (at U+200C in the General Punctuation range) prevents two adjacent character from turning into a ligature.
Representations
System | Representation |
---|---|
Nº | 847 |
UTF-8 | CD 8F |
UTF-16 | 03 4F |
UTF-32 | 00 00 03 4F |
URL-Quoted | %CD%8F |
HTML hex reference | ͏ |
Wrong windows-1252 Mojibake | â—ŒÍ |
abbreviation | CGJ |
Encoding: GB18030 (hex bytes) | 81 30 C4 35 |
Elsewhere
Complete Record
Property | Value |
---|---|
3.2 (2002) | |
COMBINING GRAPHEME JOINER | |
— | |
Combining Diacritical Marks | |
Nonspacing Mark | |
Inherited | |
Nonspacing Mark | |
Not Reordered | |
none | |
|
|
✘ | |
|
|
|
|
✘ | |
|
|
|
|
|
|
|
|
|
|
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✔ | |
✘ | |
✘ | |
✔ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✔ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
|
|
Extend | |
✘ | |
✔ | |
✘ | |
✘ | |
✘ | |
✔ | |
✘ | |
✘ | |
✘ | |
0 | |
0 | |
0 | |
✘ | |
Extend | |
— | |
NA | |
Other | |
— | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
Yes | |
Yes | |
Yes | |
Yes | |
✘ | |
✔ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
Extend | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
Extend | |
✘ | |
✔ | |
✘ | |
✘ | |
✘ | |
✘ | |
✘ | |
|
|
None | |
ambiguous | |
Not Applicable | |
— | |
No_Joining_Group | |
Transparent | |
Non-breaking (“Glue”) | |
none | |
not a number | |
|
|
R |