SYM – a symbol key mod for the enthusiasts!
I feel that the semicolon gets a too good position in standard Colemak. It must be a remnant of the insane position it has in QWERTY that puts it inside the letter block so to speak, where it doesn't deserve to be. The three symbol keys that do deserve a position among the letters are period, comma and apostrophe/quote, since they occur frequently in text. Next up is the hyphen-minus that's quite common and too far away on the top row.
Furthermore, by using the Wide mod I've already broken Shai's principle of leaving the symbol keys alone in the name of simplicity. The ANSI Wide mod moves the apostrophe up and the /? key to the lower middle, while the ISO Wide mod moves /? up; both move the brackets and plus/equals keys.
My feeling is that a semicolon-slash-apostrophe-hyphen remap could be quite nice! Trying it out, it seems to make the apostrophe bigrams noticeably more comfy. It;s easier to stretch that pinky out upwards than opening the hand outwards, that;s my clear impression of what;s what. ¯\_(ツ)_/¯
But while we're at that... Let's take a closer look at punctuation vs letter frequencies in English! ( のvの) c[_]
On the McDickens letter frequency page we see the following relative frequencies for English, after the initial SPC e t a o i n s r h l d c u m:
Excluding code: g f p w y ENT b , . v k ' " - x 0 j 1 q 2 z ) ( : ! ? 5 ; 3 4 9 / 8 6 7 [ ] ...
Including code: f g p y w ENT b , . v k - " _ ' x ) ( ; 0 j 1 q = 2 : z / * ! ? $ 3 5 > { } ...
It's clear that these punctuation keys are more common than the rare letters x j q z (having <0.16% usage each, all others having >0.8%).
The below frequency estimates are based on the numbers on Wikipedia as well as the analysis used in SteveP99's keyboard layout analyzer based on, respectively, a Project Gutenberg book lexicon and the Linux source code base:
B CM PD V K QU MN X J
bB ,< .> vV kK '" -_ xX jJ
1.5 1.2 1.3 1.0 0.8 1.0 0.5* .15 .15 (Approx. % usage. The -_ key varies w/ coding, 0.3-0.8?)
1.5 1.8 1.2 0.9 0.8 1.0 0.3 .14 .14 (SteveP's % usage, based on book analysis.)
1.5 1.7 1.1 1.3 0.9 0.4 8.1 1.5 .06 (SteveP's % usage, based on code/documentation. Underscore is 7.2!)
The semi-/colon (SC) and slash/question (SL) keys are more of a toss-up. These don't quite deserve their good positions when entering text, as frequencywise they are on par with or worse than several number row symbols – especially the parentheses. It's different for coding as particularly the semicolon is quite common in some languages. However, my impression is that the text-related symbols are more important for flow. I think people may be overreporting the importance of the semi-/colon, unless you type almost exclusively code. If you do so though, note the massive role that the underscore can play!
Since backslash is handled differently on ISO and ANSI boards, doesn't have a really good position on either and has a different key cap size on the latter, I'm uncertain whether to include it in remaps. I feel that the brackets should be left alone since they serve many purposes in locale layouts and since they're cool in the middle of the keyboard using the Wide mod. If anyone still wants to improve their role, I suggest a simple brackets-parentheses swap. So in sum, my candidates for worse positions when promoting the more common symbol keys are:
SC SL Q Z PL BS
;: /? qQ zZ =+ \|
.13 .16 .10 .08 .06? --
.29 .15 .10 .06 .00 .00 (Note the huge variation in SC frequency! SteveP reports 0.22+0.07 %.)
1.4 0.6 0.3 0.2 0.9 0.2
I've included the plus/equals (PL) key since even though it's quite rare in text, it goes to the middle of the board in the Wide mod and that's ugly and unintuitive. Maybe it could take MN's place, and SL or BS its old place? Then, PL would remain close to MN as it was before but just on a different row – like Dvorak's punctuation placements have it.
The idea of a rotation mod to make symbol placements better was discussed both in a Q > ; > ' rotation topic and in another topic where it was suggested to make it more like the Dvorak symbol placements. I don't think that's a goal in itself, but it's clear to me that Mr. Dvorak saw the importance of the hyphen-minus key.
As for Snth's idea of including the letter Q in a rotation with the quote key, I don't see any real benefit to making QU a roll. It's a comfortable bigram already, using a nice hand alternation. Even Snth has said he doesn't believe in it anymore.
So, these cycles look interesting:
For ISO boards, as two separate loops:
QU > SC
MN > SL > BS > PL
For ISO boards, leaving SL alone:
QU > SC > BS > PL > MN
For ANSI boards, leaving BS alone:
QU > SC > SL > PL > MN
Optional addition: Swap parentheses and brackets
() > [] (not a key swap so it's a bit harder to do with some implementations)
Using vanilla Colemak, on an ortho board:
6 7 8 9 0 = \
j l u y ' [ ] Sym(QuMn) mod
h n e i o - ; (leaving the slash alone)
k m , . / _____
With the Wide mod, on a staggered ISO board:
\ 7 8 9 0 =
[ j l u y ' - Wide-Sym mod
] h n e i o ; (both the quote and minus loops)
/ k m , . _____
The full CurlAngleWideSym mod on an ANSI board, adding the parenthesis/bracket swap:
1 2 3 4 5 6 / 7 8 9 0 =
Q W F P B ( J L U Y ' - \ Cmk-CAWS-ANSI
A R S T G ) K N E I O (w/ parentheses)
X C D V Z ; M H , .
Any insights and thoughts? I'm thinking this could be a "Symbol Rotation" mod on top of the other mods we use. Some optimizers might like it, while other Colemak users wouldn't bother. Is it time for Colemak-CAWS domination...?
Warning: Bad joke inside. Consider CAWS and effect. You have been warned...
Colemak-CAWS: The Sym symbol rotation together with the CurlAngleWide ergo mods on an ISO board.
[edit 2020-03:]
I tried this "Sym" mod out a little but at first I found it too confusing to be worth my effort. Maybe others will be far more enthusiastic! However, I've also tested the bang-for-buck version of it: Only the QU <> SC swap. That's a very light mod from standard Colemak, only moving one more key. In the spirit of the DH mod, it seeks to reduce lateral stretches while changing no fingerings. Heavy coders may not want to bother with it, but they should certainly consider the minus/underscore loop!
[edit 2020-04:]
I'm now using the full Sym mod and liking it. Both the apostrophe/quote and hyphen keys feel more naturally placed, especially for normal text. Coders on ANSI boards may dislike the position of semicolon in Colemak-CAWS but for ISO I think it's fine. Now, as mentioned above, swapping the brackets and parentheses could be nice? How deeply into the rabbit hole would you like to venture...?  ̄(=⌒ᆺ⌒=) ̄
[edit 2020-09:]
I feel the ANSI semicolon may have drawn the short straw here, and Slash should probably have the bottom row like with ISO. Would it be better to have the semicolon on the BS key after all?
For ANSI boards, 1) leaving BS alone or 2) putting semicolon on the BS key:
QU > SC > SL > PL > MN
QU > SC > BS > PL > MN > SL
The full CurlAngleWideSym mod on an ANSI board, with semicolon on the BS key:
1 2 3 4 5 6 \ 7 8 9 0 =
Q W F P B [ J L U Y ' - ; Cmk-CAWS-ANSI
A R S T G ] M N E I O
X C D V Z / K H , .
[edit 2020-12:]
The non-Wide mods could benefit from an easier solution: Lifting the brackets up! They aren't common at all unless you swap them for the parentheses – in which case they do become more common than SC SL PL etc. If, on the other hand, we stick to moving keys and not touching their state mappings, this should be a nice non-Wide Sym variant both for ISO and ANSI:
6 7 8 9 0 [ ]
j l u y ' - = Sym mod
h n e i o ; \ w/ bracket lift
k m , . / _____
The bracket lift swaps are quite unintrusive too. Colemak modders like Nyfee have already used them. We're down to three swaps (LB-MN RB-PL SC-QU) instead of cycles, and the Wide and non-Wide variants of the Sym mod become more similar with respect to the most common symbols.
[edit 2021-01:]
Same idea as before, but preserving the plus/equals key next to zero on the number row. The RB key stays in place this way:
7 8 9 0 = [
j l u y ' - ] Sym mod
h n e i o ; \ w/ left bracket lift
k m , . / _____
The Wide and non-Wide mods are more consistent this way, and the brackets are in the same relative positions.
[edit 2021-10:]
Here's another interesting source: Vivian Cook's page on English punctuation frequencies.
The page reports frequency per 1000 words, which should be 50× the frequency per character in percent given an average word length of 5 characters. It omits some symbols like the underscore and slash, but the results are nearly identical to what we've already seen:
PD CM QU MN SC
. , '" - ;:
1.3 1.2 1.0 .30 .13 Vivian Cook English punctuation frequencies (in %, so reported frequency / 50)
*** Learn Colemak in 2–5 steps with Tarmak! ***
*** Check out my Big Bag of Keyboard Tricks for Win/Linux/TMK... ***