Korean Character Composer
Make-up of Korean Characters
Modern written Korean (known as Hangul) is a very regimented language. It was originally developed in the 15th century by ruler Sejong the Great (1397-1450) as a system that the lower classes, in particular, would find easy to learn. It became Korea's official writing system in 1894 but it did not, for the most part, supplant the usage of Chinese script until the late 20th century.
Let's take a simple example of a sentence in Korean: "The dog chased the cat." In Korean this is: 그 사람은 집을 지었다. As with English and other Western languages, words are divided up by spaces. The third word, corresponding to "house", is 집을 and consists of two letters: 집 and 을. Unlike English, where individual letters can represent part of a syllable or a whole syllable, each letter in Korean corresponds directly to one whole syllable. So we can see that "house" in Korean is a two-syllable word.
But each character gives us further information about the make-up of the syllable concerned. If the take the first character of our word, 집, we see that it appears to be made up of three units: top left, a symbol a bit like a letter π (pi); top right, a vertical line; and something like a half-filled glass of water at the bottom.
These sub-units of a Korean character are known as jamo and there are always either two or three of them per character. The jamo represent the initial, medial (middle) and final sound of the syllable represented by the whole character.
(Keep reading after the pink table...)
Index |
|
Name |
Roman equiv. |
Unicode code- point (hex) |
Unicode code- point (dec) |
Unicode name |
0 |
ᄀ |
giyeok |
g/k |
U+1100 |
4352 |
kiyeok |
1 |
ᄁ |
ssang-giyeok |
kk |
U+1101 |
4353 |
ssang-kiyeok |
2 |
ᄂ |
nieun |
n/l |
U+1102 |
4354 |
nieun |
3 |
ᄃ |
digeut |
d/t |
U+1103 |
4355 |
tikeut |
4 |
ᄄ |
ssang-digeut |
tt |
U+1104 |
4356 |
ssang-tikeut |
5 |
ᄅ |
rieul |
r/l |
U+1105 |
4357 |
rieul |
6 |
ᄆ |
mieum |
m |
U+1106 |
4358 |
mieum |
7 |
ᄇ |
bieup |
b/p |
U+1107 |
4359 |
pieup |
8 |
ᄈ |
ssang-bieup |
pp |
U+1108 |
4360 |
ssang-pieup |
9 |
ᄉ |
siot |
s/t |
U+1109 |
4361 |
sios |
10 |
ᄊ |
ssang-siot |
ss |
U+110A |
4362 |
ssang-sios |
11 |
ᄋ |
ieung |
-/ng |
U+110B |
4363 |
ieung |
12 |
ᄌ |
jieut |
j/ch |
U+110C |
4364 |
cieuc |
13 |
ᄍ |
ssang-jieut |
jj |
U+110D |
4365 |
ssang-cieuc |
14 |
ᄎ |
chieut |
ch |
U+110E |
4366 |
chieuch |
15 |
ᄏ |
kieuk |
k |
U+110F |
4367 |
khieukh |
16 |
ᄐ |
tieut |
t |
U+1110 |
4368 |
thieuth |
17 |
ᄑ |
pieup |
p |
U+1111 |
4369 |
phieuph |
18 |
ᄒ |
hieut |
h |
U+1112 |
4370 |
hieuh |
The multi-coloured image shows the regions of a generalised Korean character. The pink section is always occupied by a jamo which represents the initial sound at the start of the syllable. The 19 jamo used in this position are shown in the pink table, right or above. The table also gives the Romanised name of the jamo, its basic/assumed sound/s (see below), its Unicode codepoint value (in hex and decimal) and Unicode name. In our example of 집, it is ᄌ (number 12 on the pink chart). (In actual fact, the initial jamo is optional but, if it is not included, it is always represented by a silent placeholder symbol similar to a circle or oval, number 11 on the chart – you can see some of these in our example sentence above.) The initial jamo always has a consonantal sound.
The yellow part of the image is occupied by a jamo which represents the medial (middle) sound of the syllable. The medial jamo is always a vowel sound. The 21 options are shown in the yellow table, right or above. Our example has ᅵ (number 20). Some of these shapes are largely vertical, some largely horizontal, and some have components that are both vertical and horizontal. Hence, the symbols don't always fill the entire yellow area if they don't need to; they may solely fit to the right of the initial jamo if they are mainly vertical in shape, or solely underneath the initial jamo if mainly horizontal.
Index |
|
Name |
Roman equiv. |
Unicode code- point (hex) |
Unicode code- point (dec) |
Unicode name |
0 |
ᆨ |
giyeok |
g/k |
U+11A8 |
4520 |
kiyeok |
1 |
ᆩ |
ssang-giyeok |
kk |
U+11A9 |
4521 |
ssang-kiyeok |
2 |
ᆪ |
giyeok-siot |
gs |
U+11AA |
4522 |
kiyeok-sios |
3 |
ᆫ |
nieun |
n/l |
U+11AB |
4523 |
nieun |
4 |
ᆬ |
nieun-jieut |
nj |
U+11AC |
4524 |
nieun-cieuc |
5 |
ᆭ |
nieun-hieut |
nh |
U+11AD |
4525 |
nieun-hieuh |
6 |
ᆮ |
digeut |
d/t |
U+11AE |
4526 |
tikeut |
7 |
ᆯ |
rieul |
r/l |
U+11AF |
4527 |
rieul |
8 |
ᆰ |
rieul-giyeok |
lg |
U+11B0 |
4528 |
rieul-kiyeok |
9 |
ᆱ |
rieul-mieum |
lm |
U+11B1 |
4529 |
rieul-mieum |
10 |
ᆲ |
rieul-bieup |
lb |
U+11B2 |
4530 |
rieul-pieup |
11 |
ᆳ |
rieul-siot |
ls |
U+11B3 |
4531 |
rieul-sios |
12 |
ᆴ |
rieul-tieut |
lt |
U+11B4 |
4532 |
rieul-thieuth |
13 |
ᆵ |
rieul-pieup |
lp |
U+11B5 |
4533 |
rieul-phieuph |
14 |
ᆶ |
rieul-hieut |
lh |
U+11B6 |
4534 |
rieul-hieuh |
15 |
ᆷ |
mieum |
m |
U+11B7 |
4535 |
mieum |
16 |
ᆸ |
bieup |
b/p |
U+11B8 |
4536 |
pieup |
17 |
ᆹ |
bieup-siot |
bs |
U+11B9 |
4537 |
pieup-sios |
18 |
ᆺ |
siot |
s/t |
U+11BA |
4538 |
sios |
19 |
ᆻ |
ssang-siot |
ss |
U+11BB |
4539 |
ssang-sios |
20 |
ᆼ |
ieung |
-/ng |
U+11BC |
4540 |
ieung |
21 |
ᆽ |
jieut |
j/ch |
U+11BD |
4541 |
cieuc |
22 |
ᆾ |
chieut |
ch |
U+11BE |
4542 |
chieuch |
23 |
ᆿ |
kieuk |
k |
U+11BF |
4543 |
khieukh |
24 |
ᇀ |
tieut |
t |
U+11C0 |
4544 |
thieuth |
25 |
ᇁ |
pieup |
p |
U+11C1 |
4545 |
phieuph |
26 |
ᇂ |
hieut |
h |
U+11C2 |
4546 |
hieuh |
The blue part of the image is occupied by the final jamo which, in our example, is ᆸ (number 16). The blue table, right or above, shows a list of the 27 jamo used in this position. This jamo is optional (and, unlike the initial jamo, a placeholder is not needed if omitted). The final jamo, if present, has a consonantal sound. As with the medial jamo, if space in the character is not needed by a jamo, other components are shrunk or stretched out to keep a pleasing rectangular shape to the finished character.
Use of software
In the applet, click the appropriate coloured areas to choose the initial (pink), medial (yellow), and the optional final (blue) jamo. The associated table will reflect the jamo chosen and will give both the "combining glyph" version of the finished character and the pre-composed version; both of these can be copy-pasted from the table to use elsewhere.
Note that Unicode uses an earlier version of Romanisation (a variation of McCune-Reischauer) to name the consonantal jamos rather than the Revised Romanisation system used since 2000. Both names are given in the jamo tables above.
|
|
Name |
Roman equiv. |
Unicode code- point (hex) |
Unicode code- point (dec) |
Unicode name |
Initial Jamo |
|
|
|
|
|
|
Medial Jamo |
|
|
|
|
|
|
Final Jamo |
|
|
|
|
|
|
Combining cluster |
|
|
|
|
|
|
Pre-combined character |
|
|
|
|
|
|
Note on Pronunciation
The fourth column in each of the three coloured tables gives the accepted written Romanisation of the jamo in question (according to the current scheme used in South Korea, called Revised Romanization). For consonants, the Romanisation used often differs depending on whether the jamo is at the starting of a syllable or the end of it. For example, bieup tends to be Romanised as a "b" when at the beginning of a syllable (the initial position) but as a "p" when at the end (the final position). A summary of how to choose when to use these alternate Romanisation forms is given here: https://www.korean.go.kr/front_eng/roman/roman_01.do.
In terms of pronunciation, the Romanizations given are only of very, very approximate use. The values given often only apply when the jamo is sounded alone (which is relatively rarely) and not in combination with other sounds. Preceding and succeeding sounds can (and do) change a jamo's sound in quite complex ways (aside from just the initial/final positional changes as noted in the paragraph above), as do assorted regional pronunciations. Your Korean will be barely intelligible if you just string the given values together and expect to be understood! See the link "Pronounciation" at the end of this article for more information.
Korean in Unicode
Hangul Jamo (1100-11FF). This area contains all the individual jamo mentioned above (19 initial, 21 medial, and 27 final – the three coloured tables mentioned above give both the hex and decimal codepoints of each of these within this Unicode area). These 67 jamo are all that are required for modern Korean, but the area also contains many seldom-used historical jamo, totalling 256 altogether (see https://en.wikipedia.org/wiki/Hangul_Jamo_(Unicode_block) or, for more detail, the official code chart at https://unicode.org/charts/PDF/U1100.pdf). All these can be used in a standalone fashion, but also have a Unicode property known as "conjoining" where multiple jamo can be combined sequentially into a complete finished character. To do this, each constituent jamo must be contiguous, without any spaces or other characters in between. For example, to form our character 집 we can use the following three jamo in direct succession: ᄌ (U+110C, 4364d), ᅵ (U+1175, 4469d), and ᆸ (U+11B8, 4536d). If whatever software and fonts you are using properly supports such combining jamo, you should get the composite character 집 appear instead of the three individual combining jamo. (If the character being composed does not have a final jamo, you can simply leave it out.) So our single character can be represented in two ways "behind the scenes" (and indeed is so, if you look at the code of this webpage): as 집, a single pre-composed character, or as 집, three contiguous conjoining jamo. (If those two characters – though represented differently in the code – don't look the same visually, then your system has broken support somewhere along the line!)
Now, having to compose every character in the above fashion would be a pain so, in practice, this Unicode section isn't really used much (except where you need to show individual jamo explicitly, like this page does in places) so there is another, more useful, Unicode area called...
Hangul Syllables (AC00h-D7A3h). This area is the main one used for modern Korean. It contains a single, pre-composed character for every possible modern Korean syllable. In other words, every possible permutation of the 67 modern jamo mentioned above. The maths for this gives 19 x 21 x 28 = 11,172 possibilities (NB the 28 allows for 27 jamo plus the possibility of no jamo at all in the final position) even though many theoretically possible permutations are not actually used in Korean words or names. This many characters is a huge table (for the brave, see https://en.wikipedia.org/wiki/Hangul_Syllables) but luckily they are arranged in a rather cunning fashion and the codepoint of a required pre-composed character can be found by a simple formula based on the indexes of the constituent jamo (as given in the three coloured tables – the pink, yellow and blue – on this page):
pre-composed code-point in decimal = ( ( initial jamo index ) x 588d + ( medial jamo index ) x 28d + ( final jamo index, if present ) ) + 44033d
or, in hex:
pre-composed code-point in hex = ( ( initial jamo index ) x 024Ch + ( medial jamo index ) x 001Ch + ( final jamo index, if present ) ) + AC01h
The final Korean Unicode area worth mentioning in passing is called Hangul Compatibility Jamo (3130h-318Fh). This is similar to the Hangul Jamo area, but is ordered in a different, historical, way. It can usually be ignored.
Further Reading
The Romanisation of Korean: https://www.korean.go.kr/front_eng/roman/roman_01.do
More details on the Romanisation of Korean: https://en.wikipedia.org/wiki/Revised_Romanization_of_Korean
Pronounciation: https://en.wikipedia.org/wiki/Hangul#Letters
Contact Details
Contact/thoughts/suggestions/moans: (Click to see e-mail address)
Original text portions of this page copyright © Steve Phillips 2024. All rights reserved.