English

Homoglyph

In orthography and typography, a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar. The designation is also applied to sequences of characters sharing these properties. In orthography and typography, a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar. The designation is also applied to sequences of characters sharing these properties. Synoglyphs are glyphs that look different but mean the same thing. Synoglyphs are also known informally as display variants. The term homograph is sometimes used synonymously with homoglyph, but in the usual linguistic sense, homographs are words that are spelled the same but have different meanings, a property of words, not characters. In 2008, the Unicode Consortium published its Technical Report #36 on a range of issues deriving from the visual similarity of characters both in single scripts, and similarities between characters in different scripts. An example of homoglyphic confusion in a historical regard results from the use of a 'y' to represent a 'þ' when setting older English texts in typefaces that do not contain the latter character. It has led in modern times to such phenomena as Ye olde shoppe, implying incorrectly that the word the was formerly written ye /jiː/. For further discussion, see thorn. Examples of homoglyphic symbols are (a) the diaeresis and umlaut (both a pair of dots, but with different meaning, although encoded with the same code points); and (b) the hyphen and minus sign (both a short horizontal stroke, but with different meaning, although often encoded with the same code point). Among digits and letters, digit 1 and lowercase l are always encoded separately but in many fonts are given very similar glyphs, and digit 0 and capital O are always encoded separately but in many fonts are given very similar glyphs. Virtually every example of a homoglyphic pair of characters can potentially be differentiated graphically with clearly distinguishable glyphs and separate code points, but this is not always done. Typefaces that do not emphatically distinguish the one/el and zero/oh homoglyphs are considered unsuitable for writing formulas, URLs, source code, IDs and other text where characters cannot always be differentiated without context. Fonts which distinguish glyphs by means of a slashed zero, for example, are preferred for those uses. In the days of mechanical typewriters these were typed with the same key, which was also used for a double inverted comma. However the umlaut originated specifically as a pair of short vertical lines (not two dots) (see Sutterlin). Incidentally the two dots above the letter E in Albanian are described as a diaresis but do not fulfil the function of a diaresis.

[ "Linguistics", "Phishing", "Operating system", "Spoofing attack", "Obfuscation", "Convolutional neural network", "Unicode", "Homograph" ]
Parent Topic
Child Topic
    No Parent Topic