Home U+FE70 to U+FEFF Arabic Presentation Forms-B
Glyph for U+FEFF
Source: Noto Sans

U+FEFF ZERO WIDTH NO-BREAK SPACE

U+FEFF was added to Unicode in version 1.1 (1993). It belongs to the block U+FE70 to U+FEFF Arabic Presentation Forms-B in the U+0000 to U+FFFF Basic Multilingual Plane.

This character is a Format and is commonly used, that is, in no specific script. The character is also known as BOM and ZWNBSP.

The glyph is not a composition. It has a Neutral East Asian Width. In bidirectional context it acts as Boundary Neutral and is not mirrored. In text U+FEFF behaves as Word Joiner regarding line breaks. It has type Format for sentence and Format for word breaks. The Grapheme Cluster Break is Control.

The Wikipedia has the following information about this codepoint:

The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFF BYTE ORDER MARK, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

  • The byte order, or endianness, of the text stream in the cases of 16-bit and 32-bit encodings;
  • The fact that the text stream's encoding is Unicode, to a high level of confidence;
  • Which Unicode character encoding is used.

BOM use is optional. Its presence interferes with the use of UTF-8 by software that does not expect non-ASCII bytes at the start of a file but that could otherwise handle the text stream.

Unicode can be encoded in units of 8-bit, 16-bit, or 32-bit integers. For the 16- and 32-bit representations, a computer receiving text from arbitrary sources needs to know which byte order the integers are encoded in. The BOM is encoded in the same scheme as the rest of the document and becomes a noncharacter Unicode code point if its bytes are swapped. Hence, the process accessing the text can examine these first few bytes to determine the endianness, without requiring some contract or metadata outside of the text stream itself. Generally the receiving computer will swap the bytes to its own endianness, if necessary, and would no longer need the BOM for processing.

The byte sequence of the BOM differs per Unicode encoding (including ones outside the Unicode standard such as UTF-7, see table below), and none of the sequences is likely to appear at the start of text streams stored in other encodings. Therefore, placing an encoded BOM at the start of a text stream can indicate that the text is Unicode and identify the encoding scheme used. This use of the BOM character is called a "Unicode signature".

Representations

System Representation
65279
UTF-8 EF BB BF
UTF-16 FE FF
UTF-32 00 00 FE FF
URL-Quoted %EF%BB%BF
HTML-Escape 
Wrong windows-1252 Mojibake 
alternate BYTE ORDER MARK
abbreviation BOM
abbreviation ZWNBSP
alias BOM
alias ZWNBSP
Adobe Glyph List zerowidthjoiner

Elsewhere

Complete Record

Property Value
Age 1.1 (1993)
Unicode Name ZERO WIDTH NO-BREAK SPACE
Unicode 1 Name BYTE ORDER MARK
Block Arabic Presentation Forms-B
General Category Format
Script Common
Bidirectional Category Boundary Neutral
Combining Class Not Reordered
Decomposition Type None
Decomposition Mapping Glyph for U+FEFF Zero Width No-Break Space
Lowercase
Simple Lowercase Mapping Glyph for U+FEFF Zero Width No-Break Space
Lowercase Mapping Glyph for U+FEFF Zero Width No-Break Space
Uppercase
Simple Uppercase Mapping Glyph for U+FEFF Zero Width No-Break Space
Uppercase Mapping Glyph for U+FEFF Zero Width No-Break Space
Simple Titlecase Mapping Glyph for U+FEFF Zero Width No-Break Space
Titlecase Mapping Glyph for U+FEFF Zero Width No-Break Space
Case Folding Glyph for U+FEFF Zero Width No-Break Space
ASCII Hex Digit
Alphabetic
Bidi Control
Bidi Mirrored
Bidi Paired Bracket Glyph for U+FEFF Zero Width No-Break Space
Bidi Paired Bracket Type None
Cased
Composition Exclusion
Case Ignorable
Full Composition Exclusion
Changes When Casefolded
Changes When Casemapped
Changes When NFKC Casefolded
Changes When Lowercased
Changes When Titlecased
Changes When Uppercased
Dash
Deprecated
Default Ignorable Code Point
Diacritic
East Asian Width Neutral
Emoji Modifier Base
Emoji Component
Emoji Modifier
Emoji
Emoji Presentation
Extender
Extended Pictographic
FC NFKC Closure Glyph for U+FEFF Zero Width No-Break Space
Grapheme Cluster Break Control
Grapheme Base
Grapheme Extend
Grapheme Link
Hex Digit
Hangul Syllable Type Not Applicable
Hyphen
ID Continue
Ideographic
ID Start
IDS Binary Operator
IDS Trinary Operator and
Indic Mantra Category
Indic Positional Category NA
Indic Syllabic Category Other
ISO 10646 Comment
Joining Group No_Joining_Group
Join Control
Jamo Short Name
Joining Type Transparent
Line Break Word Joiner
Logical Order Exception
Math
Noncharacter Code Point
NFC Quick Check Yes
NFD Quick Check Yes
NFKC Quick Check Yes
NFKD Quick Check Yes
Numeric Type None
Numeric Value not a number
Other Alphabetic
Other Default Ignorable Code Point
Other Grapheme Extend
Other ID Continue
Other ID Start
Other Lowercase
Other Math
Other Uppercase
Pattern Syntax
Pattern White Space
Prepended Concatenation Mark
Quotation Mark
Radical
Regional Indicator
Sentence Break Format
Simple Case Folding Glyph for U+FEFF Zero Width No-Break Space
Script Extension
Soft Dotted
Sentence Terminal
Terminal Punctuation
Unified Ideograph
Vertical Orientation R
Variation Selector
Word Break Format
White Space
XID Continue
XID Start
Expands On NFC
Expands On NFD
Expands On NFKC
Expands On NFKD