Korean Character Composer

Modern written Korean (known as Hangul) is a very regimented language. It was originally developed in the 15th century by ruler Sejong the Great (1397-1450) as a system that the lower classes, in particular, would find easy to learn. It became Korea's official writing system in 1894 but it did not, for the most part, supplant the usage of Chinese script until the late 20th century.

Let's take a simple example of a sentence in Korean: "The dog chased the cat." In Korean this is: 그 사람은 집을 지었다. As with English and other Western languages, words are divided up by spaces. The third word, corresponding to "house", is 집을 and consists of two letters: 집 and 을. Unlike English, where individual letters can represent part of a syllable or a whole syllable, each letter in Korean corresponds directly to one whole syllable. So we can see that "house" in Korean is a two-syllable word.

But each character gives us further information about the make-up of the syllable concerned. If the take the first character of our word, 집, we see that it appears to be made up of three units: top left, a symbol a bit like a letter π (pi); top right, a vertical line; and something like a half-filled glass of water at the bottom.

These sub-units of a Korean character are known as jamo and there are always either two or three of them per character. The jamo represent the initial, medial (middle) and final sound of the syllable represented by the whole character.

Index		Name	Roman equiv.	Unicode code- point (hex)	Unicode code- point (dec)	Unicode name
0	ᄀ	giyeok	g/k	U+1100	4352	kiyeok
1	ᄁ	ssang-giyeok	kk	U+1101	4353	ssang-kiyeok
2	ᄂ	nieun	n/l	U+1102	4354	nieun
3	ᄃ	digeut	d/t	U+1103	4355	tikeut
4	ᄄ	ssang-digeut	tt	U+1104	4356	ssang-tikeut
5	ᄅ	rieul	r/l	U+1105	4357	rieul
6	ᄆ	mieum	m	U+1106	4358	mieum
7	ᄇ	bieup	b/p	U+1107	4359	pieup
8	ᄈ	ssang-bieup	pp	U+1108	4360	ssang-pieup
9	ᄉ	siot	s/t	U+1109	4361	sios
10	ᄊ	ssang-siot	ss	U+110A	4362	ssang-sios
11	ᄋ	ieung	-/ng	U+110B	4363	ieung
12	ᄌ	jieut	j/ch	U+110C	4364	cieuc
13	ᄍ	ssang-jieut	jj	U+110D	4365	ssang-cieuc
14	ᄎ	chieut	ch	U+110E	4366	chieuch
15	ᄏ	kieuk	k	U+110F	4367	khieukh
16	ᄐ	tieut	t	U+1110	4368	thieuth
17	ᄑ	pieup	p	U+1111	4369	phieuph
18	ᄒ	hieut	h	U+1112	4370	hieuh

The multi-coloured image shows the regions of a generalised Korean character. The pink section is always occupied by a jamo which represents the initial sound at the start of the syllable. The 19 jamo used in this position are shown in the pink table, right or above. The table also gives the Romanised name of the jamo, its basic/assumed sound/s (see below), its Unicode codepoint value (in hex and decimal) and Unicode name. In our example of 집, it is ᄌ (number 12 on the pink chart). (In actual fact, the initial jamo is optional but, if it is not included, it is always represented by a silent placeholder symbol similar to a circle or oval, number 11 on the chart – you can see some of these in our example sentence above.) The initial jamo always has a consonantal sound.

Index		Name	Roman equiv.	Unicode code- point (hex)	Unicode code- point (dec)	Unicode name
0	ᅡ	a	a	U+1161	4449	a
1	ᅢ	ae	ae	U+1162	4450	ae
2	ᅣ	ya	ya	U+1163	4451	ya
3	ᅤ	yae	yae	U+1164	4452	yae
4	ᅥ	eo	eo	U+1165	4453	eo
5	ᅦ	e	e	U+1166	4454	e
6	ᅧ	yeo	yeo	U+1167	4455	yeo
7	ᅨ	ye	ye	U+1168	4456	ye
8	ᅩ	o	o	U+1169	4457	o
9	ᅪ	wa	wa	U+116A	4458	wa
10	ᅫ	wae	wae	U+116B	4459	wae
11	ᅬ	oe	oe	U+116C	4460	oe
12	ᅭ	yo	yo	U+116D	4461	yo
13	ᅮ	u	u	U+116E	4462	u
14	ᅯ	wo/weo	wo	U+116F	4463	weo
15	ᅰ	we	we	U+1170	4464	we
16	ᅱ	wi	wi	U+1171	4465	wi
17	ᅲ	yu	yu	U+1172	4466	yu
18	ᅳ	eu	eu	U+1173	4467	eu
19	ᅴ	yi/ui	ui	U+1174	4468	yi
20	ᅵ	i	i	U+1175	4469	i

The yellow part of the image is occupied by a jamo which represents the medial (middle) sound of the syllable. The medial jamo is always a vowel sound. The 21 options are shown in the yellow table, right or above. Our example has ᅵ (number 20). Some of these shapes are largely vertical, some largely horizontal, and some have components that are both vertical and horizontal. Hence, the symbols don't always fill the entire yellow area if they don't need to; they may solely fit to the right of the initial jamo if they are mainly vertical in shape, or solely underneath the initial jamo if mainly horizontal.

Index		Name	Roman equiv.	Unicode code- point (hex)	Unicode code- point (dec)	Unicode name
0	ᆨ	giyeok	g/k	U+11A8	4520	kiyeok
1	ᆩ	ssang-giyeok	kk	U+11A9	4521	ssang-kiyeok
2	ᆪ	giyeok-siot	gs	U+11AA	4522	kiyeok-sios
3	ᆫ	nieun	n/l	U+11AB	4523	nieun
4	ᆬ	nieun-jieut	nj	U+11AC	4524	nieun-cieuc
5	ᆭ	nieun-hieut	nh	U+11AD	4525	nieun-hieuh
6	ᆮ	digeut	d/t	U+11AE	4526	tikeut
7	ᆯ	rieul	r/l	U+11AF	4527	rieul
8	ᆰ	rieul-giyeok	lg	U+11B0	4528	rieul-kiyeok
9	ᆱ	rieul-mieum	lm	U+11B1	4529	rieul-mieum
10	ᆲ	rieul-bieup	lb	U+11B2	4530	rieul-pieup
11	ᆳ	rieul-siot	ls	U+11B3	4531	rieul-sios
12	ᆴ	rieul-tieut	lt	U+11B4	4532	rieul-thieuth
13	ᆵ	rieul-pieup	lp	U+11B5	4533	rieul-phieuph
14	ᆶ	rieul-hieut	lh	U+11B6	4534	rieul-hieuh
15	ᆷ	mieum	m	U+11B7	4535	mieum
16	ᆸ	bieup	b/p	U+11B8	4536	pieup
17	ᆹ	bieup-siot	bs	U+11B9	4537	pieup-sios
18	ᆺ	siot	s/t	U+11BA	4538	sios
19	ᆻ	ssang-siot	ss	U+11BB	4539	ssang-sios
20	ᆼ	ieung	-/ng	U+11BC	4540	ieung
21	ᆽ	jieut	j/ch	U+11BD	4541	cieuc
22	ᆾ	chieut	ch	U+11BE	4542	chieuch
23	ᆿ	kieuk	k	U+11BF	4543	khieukh
24	ᇀ	tieut	t	U+11C0	4544	thieuth
25	ᇁ	pieup	p	U+11C1	4545	phieuph
26	ᇂ	hieut	h	U+11C2	4546	hieuh

The blue part of the image is occupied by the final jamo which, in our example, is ᆸ (number 16). The blue table, right or above, shows a list of the 27 jamo used in this position. This jamo is optional (and, unlike the initial jamo, a placeholder is not needed if omitted). The final jamo, if present, has a consonantal sound. As with the medial jamo, if space in the character is not needed by a jamo, other components are shrunk or stretched out to keep a pleasing rectangular shape to the finished character.

Use of software

In the applet, click the appropriate coloured areas to choose the initial (pink), medial (yellow), and the optional final (blue) jamo. The associated table will reflect the jamo chosen and will give both the "combining glyph" version of the finished character and the pre-composed version; both of these can be copy-pasted from the table to use elsewhere.

Note that Unicode uses an earlier version of Romanisation (a variation of McCune-Reischauer) to name the consonantal jamos rather than the Revised Romanisation system used since 2000. Both names are given in the jamo tables above.

		Name	Roman equiv.	Unicode code- point (hex)	Unicode code- point (dec)	Unicode name
Initial Jamo
Medial Jamo
Final Jamo
Combining cluster
Pre-combined character

Note on Pronunciation

The fourth column in each of the three coloured tables gives the accepted written Romanisation of the jamo in question (according to the current scheme used in South Korea, called Revised Romanization). For consonants, the Romanisation used often differs depending on whether the jamo is at the starting of a syllable or the end of it. For example, bieup tends to be Romanised as a "b" when at the beginning of a syllable (the initial position) but as a "p" when at the end (the final position). A summary of how to choose when to use these alternate Romanisation forms is given here: https://www.korean.go.kr/front_eng/roman/roman_01.do.

In terms of pronunciation, the Romanizations given are only of very, very approximate use. The values given often only apply when the jamo is sounded alone (which is relatively rarely) and not in combination with other sounds. Preceding and succeeding sounds can (and do) change a jamo's sound in quite complex ways (aside from just the initial/final positional changes as noted in the paragraph above), as do assorted regional pronunciations. Your Korean will be barely intelligible if you just string the given values together and expect to be understood! See the link "Pronounciation" at the end of this article for more information.

Korean in Unicode

Hangul Jamo (1100-11FF). This area contains all the individual jamo mentioned above (19 initial, 21 medial, and 27 final – the three coloured tables mentioned above give both the hex and decimal codepoints of each of these within this Unicode area). These 67 jamo are all that are required for modern Korean, but the area also contains many seldom-used historical jamo, totalling 256 altogether (see https://en.wikipedia.org/wiki/Hangul_Jamo_(Unicode_block) or, for more detail, the official code chart at https://unicode.org/charts/PDF/U1100.pdf). All these can be used in a standalone fashion, but also have a Unicode property known as "conjoining" where multiple jamo can be combined sequentially into a complete finished character. To do this, each constituent jamo must be contiguous, without any spaces or other characters in between. For example, to form our character 집 we can use the following three jamo in direct succession: ᄌ (U+110C, 4364d), ᅵ (U+1175, 4469d), and ᆸ (U+11B8, 4536d). If whatever software and fonts you are using properly supports such combining jamo, you should get the composite character 집 appear instead of the three individual combining jamo. (If the character being composed does not have a final jamo, you can simply leave it out.) So our single character can be represented in two ways "behind the scenes" (and indeed is so, if you look at the code of this webpage): as 집, a single pre-composed character, or as 집, three contiguous conjoining jamo. (If those two characters – though represented differently in the code – don't look the same visually, then your system has broken support somewhere along the line!)

Now, having to compose every character in the above fashion would be a pain so, in practice, this Unicode section isn't really used much (except where you need to show individual jamo explicitly, like this page does in places) so there is another, more useful, Unicode area called...

Hangul Syllables (AC00h-D7A3h). This area is the main one used for modern Korean. It contains a single, pre-composed character for every possible modern Korean syllable. In other words, every possible permutation of the 67 modern jamo mentioned above. The maths for this gives 19 x 21 x 28 = 11,172 possibilities (NB the 28 allows for 27 jamo plus the possibility of no jamo at all in the final position) even though many theoretically possible permutations are not actually used in Korean words or names. This many characters is a huge table (for the brave, see https://en.wikipedia.org/wiki/Hangul_Syllables) but luckily they are arranged in a rather cunning fashion and the codepoint of a required pre-composed character can be found by a simple formula based on the indexes of the constituent jamo (as given in the three coloured tables – the pink, yellow and blue – on this page):

pre-composed code-point in decimal = ( ( initial jamo index ) x 588d + ( medial jamo index ) x 28d + ( final jamo index, if present ) ) + 44033d

pre-composed code-point in hex = ( ( initial jamo index ) x 024Ch + ( medial jamo index ) x 001Ch + ( final jamo index, if present ) ) + AC01h

The final Korean Unicode area worth mentioning in passing is called Hangul Compatibility Jamo (3130h-318Fh). This is similar to the Hangul Jamo area, but is ordered in a different, historical, way. It can usually be ignored.

Korean Character Composer

Make-up of Korean Characters

Use of software

Note on Pronunciation

Korean in Unicode

Further Reading

Contact Details