Curlie

Character Encoding

Information, resources and products related to international character encoding, national character sets and character conversion issues.

Character sets that relate to specific alphabets should go in the relevant subcategory.

Technical information to do with Unicode only should go in the Unicode subcategory.

Arabic

Arabic script encodings, including Arabic, Persian/Farsi and Kurdish.

For all Arabic encodings, including Unicode.

Chinese

Simplified and Traditional Chinese character encoding systems.

Includes Unicode in addition to other Chinese code pages.

CJKV stands for Chinese, Japanese, Korean, and Vietnamese and is an acronym used to describe these far-east languages and writing systems that contain more than 256 individual characters and can therefore only be represented by more than one byte per character. CJKV is a particular term used in Globalization - this category deals with the process in general. Individual language categories exist for specific languages.

Cyrillic

Used by many languages, including Russian, Ukrainian, Bulgarian, Macedonian, Serbian, Belorussian, Kurdish, Kazakh, Kyrgyz, Mongolian and Uzbek.

For Cyrillic encoding methods only, including Unicode. Many languages that use Cyrillic also use Latin, Arabic and other encoding systems.

Greek

Modern Greek and Coptic character sets. Although Greek is a well-known modern language, Coptic is a ceremonial language still in use in the Middle East.

For all Greek and Coptic encodings, including Unicode.

Hangul

Hangul is the Korean alphabet, related in some ways to Chinese, but otherwise unique to Korea and similar in structure to many Indo-European alphabet systems.

For all Korean and Hangul encoding systems including Unicode.

Hebrew

Hebrew, Yiddish and Ladino alphabets.

For non-Latin versions of Hebew, Yiddish and Ladino encoding systems, including Unicode.

Indic

Bengali, Devanagari, Gujarati, Gurmukhi, Hindi, Kannada, Khmer, Lao, Malayalam, Marathi, Nepali, Oriya, Sanskrit, Sinhala, Tamil, Telugu, Tibetan and Thai characters sets use variations of Brahmi-derived Indic characters.

For all Indic character sets, including Unicode, ISCII and related encoding systems.

Japanese

Japanese uses various character encoding systems, from the traditional Kanji to the Latin-derived Romaji.

For the various encoding systems used in Japan, including Hiragana, Katakana, Kanji and Romaji.

Latin

Used by Afrikaans, Albanian, Aymara, Azeri, Bailnese, Basque, Breton, Catalan, Cornish, Danish, Dutch/Nederlands, English, Esperanto, Finnish, French, Gaelic, German, Icelandic, Indonesian, Irish, Italian, Malaysian, Manx, Norwegian, Portuguese, Spanish, Swedish, Tagalog, Vietnamese, Welsh and many other languages.

For Latin character sets only. Many language use a local alphabetic system too.

Native American

There are many languages native to North and South America, such as Cree, Navajo, Mayan, Aztec, Incan and Inuit (Inuktitut).

For all Native American encoding systems, typically Unicode character sets.

Unicode

Unicode is the standard character encoding system that allows the correct display and entry of virtually all characters of every language in the world.

Any Unicode submissions specific to character sets should be submitted into the relevant category. This category is for general Unicode issues.