Korean Character Composer

Make-up of Korean Characters

Modern written Korean (known as Hangul) is a very regimented language. It was originally developed in the 15th century by ruler Sejong the Great (1397-1450) as a system that the lower classes, in particular, would find easy to learn. It became Korea's official writing system in 1894 but it did not, for the most part, supplant the usage of Chinese script until the late 20th century.

Let's take a simple example of a sentence in Korean: "The dog chased the cat." In Korean this is: 그  사람은  집을  지었다. As with English and other Western languages, words are divided up by spaces. The third word, corresponding to "house", is 집을 and consists of two letters: and . Unlike English, where individual letters can represent part of a syllable or a whole syllable, each letter in Korean corresponds directly to one whole syllable. So we can see that "house" in Korean is a two-syllable word.

But each character gives us further information about the make-up of the syllable concerned. If the take the first character of our word, , we see that it appears to be made up of three units: top left, a symbol a bit like a letter π (pi); top right, a vertical line; and something like a half-filled glass of water at the bottom.

These sub-units of a Korean character are known as jamo and there are always either two or three of them per character. The jamo represent the initial, medial (middle) and final sound of the syllable represented by the whole character.

(Keep reading after the pink table...)

Index Name Roman
equiv.
Unicode
code-
point
(hex)
Unicode
code-
point
(dec)
Unicode
name
0 giyeok g/k U+1100 4352 kiyeok
1 ssang-giyeok kk U+1101 4353 ssang-kiyeok
2 nieun n/l U+1102 4354 nieun
3 digeut d/t U+1103 4355 tikeut
4 ssang-digeut tt U+1104 4356 ssang-tikeut
5 rieul r/l U+1105 4357 rieul
6 mieum m U+1106 4358 mieum
7 bieup b/p U+1107 4359 pieup
8 ssang-bieup pp U+1108 4360 ssang-pieup
9 siot s/t U+1109 4361 sios
10 ssang-siot ss U+110A 4362 ssang-sios
11 ieung -/ng U+110B 4363 ieung
12 jieut j/ch U+110C 4364 cieuc
13 ssang-jieut jj U+110D 4365 ssang-cieuc
14 chieut ch U+110E 4366 chieuch
15 kieuk k U+110F 4367 khieukh
16 tieut t U+1110 4368 thieuth
17 pieup p U+1111 4369 phieuph
18 hieut h U+1112 4370 hieuh

The multi-coloured image shows the regions of a generalised Korean character. The pink section is always occupied by a jamo which represents the initial sound at the start of the syllable. The 19 jamo used in this position are shown in the pink table, right or above. The table also gives the Romanised name of the jamo, its basic/assumed sound/s (see below), its Unicode codepoint value (in hex and decimal) and Unicode name. In our example of , it is (number 12 on the pink chart). (In actual fact, the initial jamo is optional but, if it is not included, it is always represented by a silent placeholder symbol similar to a circle or oval, number 11 on the chart – you can see some of these in our example sentence above.) The initial jamo always has a consonantal sound.

Index Name Roman
equiv.
Unicode
code-
point
(hex)
Unicode
code-
point
(dec)
Unicode
name
0 a a U+1161 4449 a
1 ae ae U+1162 4450 ae
2 ya ya U+1163 4451 ya
3 yae yae U+1164 4452 yae
4 eo eo U+1165 4453 eo
5 e e U+1166 4454 e
6 yeo yeo U+1167 4455 yeo
7 ye ye U+1168 4456 ye
8 o o U+1169 4457 o
9 wa wa U+116A 4458 wa
10 wae wae U+116B 4459 wae
11 oe oe U+116C 4460 oe
12 yo yo U+116D 4461 yo
13 u u U+116E 4462 u
14 wo/weo wo U+116F 4463 weo
15 we we U+1170 4464 we
16 wi wi U+1171 4465 wi
17 yu yu U+1172 4466 yu
18 eu eu U+1173 4467 eu
19 yi/ui ui U+1174 4468 yi
20 i i U+1175 4469 i

The yellow part of the image is occupied by a jamo which represents the medial (middle) sound of the syllable. The medial jamo is always a vowel sound. The 21 options are shown in the yellow table, right or above. Our example has (number 20). Some of these shapes are largely vertical, some largely horizontal, and some have components that are both vertical and horizontal. Hence, the symbols don't always fill the entire yellow area if they don't need to; they may solely fit to the right of the initial jamo if they are mainly vertical in shape, or solely underneath the initial jamo if mainly horizontal.

Index Name Roman
equiv.
Unicode
code-
point
(hex)
Unicode
code-
point
(dec)
Unicode
name
0 giyeok g/k U+11A8 4520 kiyeok
1 ssang-giyeok kk U+11A9 4521 ssang-kiyeok
2 giyeok-siot gs U+11AA 4522 kiyeok-sios
3 nieun n/l U+11AB 4523 nieun
4 nieun-jieut nj U+11AC 4524 nieun-cieuc
5 nieun-hieut nh U+11AD 4525 nieun-hieuh
6 digeut d/t U+11AE 4526 tikeut
7 rieul r/l U+11AF 4527 rieul
8 rieul-giyeok lg U+11B0 4528 rieul-kiyeok
9 rieul-mieum lm U+11B1 4529 rieul-mieum
10 rieul-bieup lb U+11B2 4530 rieul-pieup
11 rieul-siot ls U+11B3 4531 rieul-sios
12 rieul-tieut lt U+11B4 4532 rieul-thieuth
13 rieul-pieup lp U+11B5 4533 rieul-phieuph
14 rieul-hieut lh U+11B6 4534 rieul-hieuh
15 mieum m U+11B7 4535 mieum
16 bieup b/p U+11B8 4536 pieup
17 bieup-siot bs U+11B9 4537 pieup-sios
18 siot s/t U+11BA 4538 sios
19 ssang-siot ss U+11BB 4539 ssang-sios
20 ieung -/ng U+11BC 4540 ieung
21 jieut j/ch U+11BD 4541 cieuc
22 chieut ch U+11BE 4542 chieuch
23 kieuk k U+11BF 4543 khieukh
24 tieut t U+11C0 4544 thieuth
25 pieup p U+11C1 4545 phieuph
26 hieut h U+11C2 4546 hieuh

The blue part of the image is occupied by the final jamo which, in our example, is (number 16). The blue table, right or above, shows a list of the 27 jamo used in this position. This jamo is optional (and, unlike the initial jamo, a placeholder is not needed if omitted). The final jamo, if present, has a consonantal sound. As with the medial jamo, if space in the character is not needed by a jamo, other components are shrunk or stretched out to keep a pleasing rectangular shape to the finished character.

Use of software

In the applet, click the appropriate coloured areas to choose the initial (pink), medial (yellow), and the optional final (blue) jamo. The associated table will reflect the jamo chosen and will give both the "combining glyph" version of the finished character and the pre-composed version; both of these can be copy-pasted from the table to use elsewhere.

Note that Unicode uses an earlier version of Romanisation (a variation of McCune-Reischauer) to name the consonantal jamos rather than the Revised Romanisation system used since 2000. Both names are given in the jamo tables above.

character board initial medial final
Name Roman
equiv.
Unicode
code-
point
(hex)
Unicode
code-
point
(dec)
Unicode
name
Initial Jamo
Medial Jamo
Final Jamo
Comb­ining
cluster
Pre-comb­ined
char­acter

Note on Pronunciation

The fourth column in each of the three tables gives the accepted written Romanisation of the jamo in question (according to the current scheme used in South Korea, called Revised Romanization). For consonants, the Romanisation used often differs depending on whether the jamo is at the starting of a syllable or the end of it. For example, bieup tends to be Romanised as a "b" when at the beginning of a syllable (the initial position) but as a "p" when at the end (the final position). A summary of how to choose when to use these alternate Romanisation forms is given here: https://www.korean.go.kr/front_eng/roman/roman_01.do.

In terms of pronunciation, the Romanizations given are only of very, very approximate use. The values given often only apply when the jamo is sounded alone (which is relatively rarely) and not in combination with other sounds. Preceding and succeeding sounds can (and do) change a jamo's sound in quite complex ways (aside from just the initial/final positional changes as noted in the paragraph above), as do assorted regional pronunciations. Your Korean will be barely intelligible if you just string the given values together and expect to be understood! See the link "Pronounciation" at the end of this article for more information.

Korean in Unicode

Hangul Jamo (1100-11FF). This area contains all the individual jamo mentioned above (19 initial, 21 medial, and 27 final – the three tables mentioned above give both the hex and decimal codepoints of each of these within this Unicode area). These 67 jamo are all that are required for modern Korean, but the area also contains many seldom-used historical jamo, totalling 256 altogether (see https://en.wikipedia.org/wiki/Hangul_Jamo_(Unicode_block) or, for more detail, the official code chart at https://unicode.org/charts/PDF/U1100.pdf). All these can be used in a standalone fashion, but also have a property known as "conjoining" where multiple jamo can be combined sequentially into a complete finished character. To do this, each constituent jamo must be contiguous, without any spaces or other characters in between. For example, to form our character we can use the following three jamo in direct succession: (U+110C, 4364d), (U+1175, 4469d), and (U+11B8, 4536d). If whatever software and fonts you are using properly supports such combining jamo, you should get the composite character 집 appear instead of the three individual combining jamo. (If the character being composed does not have a final jamo, you can simply leave it out.) So our single character can be represented in two ways "behind the scenes" (and indeed is so, if you look at the code of this webpage) as , a single pre-composed character, or as 집, three contiguous conjoining jamo. (If those two characters – though represented differently in the code – don't look the same visually, then your system has broken support somewhere along the line!

Now, having to compose every character in the above fashion would be a pain so, in practice, this Unicode section isn't really used much (except where you need to show individual jamo explicitly, like this page does in places) so there is another, more useful, Unicode area called...

Hangul Syllables (AC00h-D7A3h). This area is the main one used for modern Korean. It contains a single, pre-composed character for every possible modern Korean syllable. In other words, every possible permutation of the 67 modern jamo mentioned above. The maths for this gives 19 x 21 x 28 = 11,172 possibilities (NB the 28 allows for 27 jamo plus the possibility of no jamo at all in the final position) even though many theoretically possible permutations are not actually used in Korean words or names. This many characters is a huge table (for the brave, see https://en.wikipedia.org/wiki/Hangul_Syllables) but luckily they are arranged in a rather cunning fashion and the codepoint of the required pre-composed character can be found by a simple formula based on the indexes of the constituent jamo (as given in the three tables – the pink, yellow and blue – on this page):

pre-composed code-point in decimal = ( ( initial jamo index ) x 588d + ( medial jamo index ) x 28d + ( final jamo index, if present ) ) + 44033d

or, in hex:

pre-composed code-point in hex = ( ( initial jamo index ) x 024Ch + ( medial jamo index ) x 001Ch + ( final jamo index, if present ) ) + AC01h

The final Korean Unicode area worth mentioning in passing is called Hangul Compatibility Jamo (3130h-318Fh). This is similar to the Hangul Jamo area, but is ordered in a different, historical, way. It can usually be ignored.

Further Reading

The Romanisation of Korean: https://www.korean.go.kr/front_eng/roman/roman_01.do

More details on the Romanisation of Korean: https://en.wikipedia.org/wiki/Revised_Romanization_of_Korean

Pronounciation: https://en.wikipedia.org/wiki/Hangul#Letters

Contact Details

Contact/thoughts/suggestions/moans: (Click to see e-mail address)

Original text portions of this page copyright © Steve Phillips 2024. All rights reserved.