    Analysis of Colemak & Mod-DH for some European languages

    I recently added a new ability to my anaylzer to load in a different set of monogram and bigram frequencies. This makes it possible to analyze layouts for different languages, in which the letter and bigram frequencies may be differ somewhat from English. You can try this for yourself on my updated layout analyzer, but the results are also summarized here:

    Note: to generate these, I used frequency tables for the language in question, but the layout tested is the just the standard Colemak(DH)/Dvorak/Qwerty layout, without the special characters or modifications (QWERTZ/AZERTY/etc) which some of these languages use. So it's not a comprehensive analysis as it might be, but hopefully still a useful indicator.


    Layout    sf-bigrams  score
    colemak_dh  4.27%     1.714  
    colemak     4.27%     1.750  
    dvorak      4.97%     2.015  
    qwerty     11.83%     2.348  


    Layout      sf-bigrams  score
    colemak_dh    4.98%     1.695  
    colemak       4.98%     1.759  
    dvorak        3.49%     1.900  
    qwerty        9.93%     2.385  


    Layout      sf-bigrams  score
    colemak_dh    3.60%     1.630  
    colemak       3.60%     1.644  
    dvorak        3.56%     1.894  
    qwerty        9.25%     2.348  


    Layout      sf-bigrams  score
    colemak_dh    3.25%     1.655  
    colemak       3.25%     1.683  
    dvorak        2.48%     1.889  
    qwerty        9.65%     2.311  


    Layout      sf-bigrams  score
    colemak_dh    3.80%     1.929  
    colemak       3.80%     1.948  
    dvorak        6.82%     2.174  
    qwerty        9.43%     2.474  


    Layout      sf-bigrams   score
    colemak_dh    4.13%      1.708  
    colemak       4.13%      1.730  
    dvorak        4.31%      1.983  
    qwerty       10.15%      2.316  

    On this quick test, it's interesting to note that:

    - Qwerty is by the far the worst in every language :P
    - The language gaining least from Colemak is Polish. French and German gain the most - presumably because these languages are "closer" to English.
    - There is a noticeable same-finger penalty in all languages, but at least in Colemak it's always less severe than Qwerty.
    - Colemak beats Dvorak in each language, although interestingly, Dvorak has low same-finger for Spanish!
    - Mod-DH still comes out the best (of the 4 layouts tested) in each language (yay!)

    Those who know some of these languages might be able interpret the results better than I can.

    Nice work, Steve!

    Spanish is very CV (single consonant + single vowel) based, so that would make sense about the low same finger in Dvorak, since vowels are all on the left. I image that languages such as Japanese would be with similar results if they used the Latin alphabet. There is a Spanish version of Dvorak that switches the R and H, which might yield a little different score. This is an old post, but interesting. I was trying to find the single best layout for both English and Spanish, since I write a lot in both.

