Locale Colemak variant for Tiếng Việt (Vietnamese)!?
Over at the Colemak Discord channel, icedryst and I've had some interesting discussions about Vietnamese script!
Chữ Việt (Vietnamese script), formally named Chữ Quốc Ngữ, is really interesting: Albeit a latin script, it uses multiple accents which poses an interesting layout challenge. The reason for this is that the language has more vowels than the basic latin alphabet. The extra ones are ÂÊÔ (using the circumflex accent), ƠƯ (using horn) and Ă (using breve). All vowels may have one of five accents ( ´ ` ̉ . ~ or acute/grave/horn_above/dot_below/tilde), making combos like Ợ and Ấ commonplace.
Personally, I'd think that it'd be easier to use special basis letters like for instance ƗɨØøɄʉ or αε instead of creating more vowels with accents and then putting another accent on top of that! If special care isn't taken to avoid accent collisions in typography, it can even be ugly and hard to read. But that can't be helped now I guess, it is what it is.
Since this is a bit complex, the Vietnamese use an Input Method Editor (IME), using a Compose method allowing you to write a letter combo to get a special letter. As seen below, the Telex (or VNI in the North) method is the most common, and as seen below and in this post by Tony_VN the Unikey program is used on Windows. That works, but it isn't perfect. More on that later!
Here are some notes on the language/script from that thread:
Notes about Vietnamese script (from IceDryst) ============================================= There are 12 basic vowel letters and one special consonant: a â ă / đ / e ê / i / o ô ơ / u ư / y There are five accent tones: Rising (ú) Rising glottalized (ũ) Dipping-rising (ủ) Falling (ù) Falling glottalized (ụ) These are all the vowel/accent combinations: a á à ả ã ạ â ấ ầ ẩ ẫ ậ ă ắ ằ ả ẵ ặ e é è ẻ ẽ ẹ ê ế ề ể ễ ệ i í ì ỉ ĩ ị o ó ò ỏ õ ọ ô ố ồ ổ ỗ ộ ơ ớ ờ ở ỡ ợ u ú ù ủ ũ ụ ư ứ ừ ử ữ ự y ý ỳ ỷ ỹ ỵ The Telex method for entry with Latin letters: - â = aa ă = aw | ´ = s (1 w/ VNI) - đ = dd | ` = f (2 --"-- ) - ê = ee | ̉ = r (3 --"-- ) - ô = oo ơ = ow | ~ = x (4 --"-- ) - ư = uw | . = j (5 --"-- ) - The letters sfrxjw after a vowel are entered by repeating them. - Delete last accent: z. - In sum: as = á, asz = a, ass = as, aza = aa(?) - A Vietnamese word is usually in the form (Consonant)-vowels-(consonant). - Some words don't have any consonants: E.g., "ao" means "lake", "ai" = "who" etc. - The letters F Z J W aren't used in Vietnamese; S R X don't appear at the end of words. - Telex more popular in the North, VNI in the south. - If someone care about Colemak they will with 99% certainty use TELEX beforehand. - Vietnamese use standard ANSI keyboards with an Input Method Editor (IME), not special hardware. - For Windows, Unikey is almost exclusively used. Doesn't work well with PKL (keyboard hook trouble?). - Telex is faster, but harder to type mixed with English words. You may switch back and forth with a hotkey. - The Telex method uses same-finger bigram entry of âêôđ and sfrxjw (after a vowel) which is bad. - Typing 'aw' for ă isn't so comfy either even though that letter is rareish. - Telex can take some getting used to for the newcomer!
I was fascinated, and followed a link to a Vietnamese letter frequency page by Stefan Trost Media. There I started analyzing, and ended up getting Stefan's blessing to use his data to try making a Vietnamese Colemak variant!
Here are my observations of his data, should anyone be interested:
Observations:
- The accents are all common: Acute, grave, dot_below, horn_above, tilde (á à ạ ả ã).
- Most base special letters are common: ô đ ư ê ơ â ơ. Less common: ă.
Making a Vietnamese Colemak[eD]
For Windows and Linux, the Colemak[eD] layout has all the dead key functionality needed to type Vietnamese. For instance, to type ự I'll type AltGr+3,1 then u. This works okay for separate words, but I don't think that it's practical enough for typing Vietnamese text: The common accents are mostly on AltGr+<number row key> which is a bit out of the way.
Standard Colemak[eD] accents needed for Tiếng Việt typing (all with AltGr+<key>):
1 ọ 2 ỏ
3 ơ 6 ô
9 ă ' ó
\ ò / đ
Currently, PortableKeyboardLayout (PKL) for Windows doesn't support the Telex method but it may soon. Today it seems that you can't use PKL with Unikey which is a shame as you won't get Extend and all that other goodness then! I'd like PKL to support both Telex and a Colemak[eD] Vietnamese variant.
I realize that most Vietnamese typists are used to the Telex method and won't want to change. But since you're reading this, you've proven that you're interested in going further that most. ;-)
My vision is, like with my other Colemak locale variants, to keep Colemak intact so you can type English with it while facilitating Vietnamese typing too! And if possible, more ergonomic than Telex. A tall order, but let's try! ^_^
Proposals for Colemak[eD]-Vi
- A special letter key for, e.g., Ôô? Common, but seems almost unfair to the many others...?
- Could use the VK_102 ISO key for that. Problem: Most Vietnamese don't have an ISO keyboard!
- AltGr+adeu for âđêư. AltGr+w for ă.
- O is more tricky. AltGr+o for ô, AltGr+y or AltGr+i for ơ ?
- The i position is right next to o, but y is a less common vowel which may help here.
- Since ơi is a common bigram, it's better to have ơ on y and use the middle finger for ơ in ơi.
- Consider specific bigrams, particularly with accented êơư!
- Easy 'e, easy 'o, okay 'u
- Easy ;e, easy ;u, hard ;o – but ỡ is _very_ rare!
- Easy .e, okay .u, okay .o
- Easy ,o, hard ,u, hard ,e – and ể is 0.6%. Should not use the comma dead key then!
- Easy 'y, easy ;y, okay .y (w/ alt. fingering); the same with i.
- Could reuse the good acute dead key from Colemak[eD]. It's the most common, but we need 4 more.
- Swap some dead keys w/ the eD ones to make them more accessible? But that's a bit confusing.
- tilde and diaeresis (~ ;)?
- dot_below and dot_above (! .)?
My suggestion for now:
Letters:
========
đ on AltGr+ d (or VK_102 if you have an ISO keyboard)
â ê ô on AltGr+ a e o
ơ ư on AltGr+ y u
ă on AltGr+ w
Accents:
========
´` on [] brackets - intuitive rising/falling pattern; easy to hit
~. on {} braces - these are rising/falling too (but glottalized)
? on AltGr-' - easy reach
Having the common accents on brackets and using RAlt(AltGr) a lot, a Wide ergo mod is recommended!
Maybe it's better to have ~. accents on AltGr+brackets instead; see below.
Again, one problem may be that Vietnamese typists aren't used to the AltGr key (on RAlt). Particularly when using a Wide mod, it's very nice I think. But it's somewhat of a taste thing I guess.
A real hurdle is that people are used to thinking vowel-accent and not accent-vowel when spelling: B-a-`-n = Bàn; T-i-ê-´-n = Tiến etc. So dead keys will take some getting used to.
Composing like the Telex method has its advantages too, but less so when it uses same-finger bigrams and not when you need to write a mix of non-Vietnamese and Vietnamese words.
One suggestion is an improved "Telex" way, avoiding the same-finger bigrams and other problems! It's an interesting thought, but I honestly don't think it'd become popular since the standard Telex method is so ingrained. And the problem of switching method would still persist.
So, what do the people think? Chime in, please! Not in Tiếng Việt though, as I don't actually know this fascinating language! ^_^
[edit: Accents suggestion v2; Special dead keys]
I think that shifted accents likely isn't good enough for tilde and dot_below, as they're common and a complex pattern of shifting and using AltGr gets confusing.
Accents v2:
==========
´` on [] brackets
~. on , . (dual-role dead key)
? on ; --"--
(Or would it be better with tilde on semicolon and hook on comma? ~. are related as up/down, like ´` are. But comfort above all!)
The idea would be that semicolon, comma and period release their base char when not modifying vowels. PKL can support this.
For writing sequences of these symbols, their base versions should also be available. These could be on AltGr+<key>.
Even though ,u and ,e aren't so comfy bigrams as such, they aren't so bad if the comma is unmodified.
A downside to that is that it's harder to implement outside of PKL. Not sure how to do it with XKB. Might be okay for MSKLC.
[edit 2018-07: Accent suggestion v3–4; AltGr + good left-hand keys]
We're still working on this, testing out variants to see what works best. Getting rid of the same-key bigrams and making something that works both for English and Việt still seems like a worthy cause!
* Special letters are on AltGr(RAlt) plus letters A D E I O Y W (I and Y are duplicates, to test which one is better)
* Version 1: Accents ´`?~. are on bracket keys as well as AltGr (or Shift) plus brackets and '
* Version 2: Accents ´`?~. are on bracket keys and ; , . as special dual-function dead keys releasing the base character.
* Version 3: Accents ´`?~. are on AltGr plus S T R F P. For SR this coincides with Telex; the rest is geometric/ergonomic.
* Version 4: Accents ´`?~. are on bracket keys as well as AltGr plus R S T.
So, the latest version goes like this then:
Letters:
========
đ on AltGr+ d
â ă on AltGr+ a w
ê ô on AltGr+ e o
ơ ư on AltGr+ i u
Accents v4:
==========
´ ` on [ ] unmodified brackets
~ . on AltGr+ s t
? on AltGr+ r
(Version 3 had ~ . on f p, with ´ ` on s t.)
It seems to me that this will produce little conflict with the normal and special vowels, but some testing remains.
[Update:]
I can now generate help images for any layout in PKL, so here's what the proposed layout looks like now. Note the redundancies on Colemak FP/ST, Y/I and R/apostrophe; in the testing phase these are/were still being evaluated.
*** Learn Colemak in 2–5 steps with Tarmak! ***
*** Check out my Big Bag of Keyboard Tricks for Win/Linux/TMK... ***