• You are not logged in.

Colemak and italian

  • Started by rebus
  • 36 Replies:
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136

Thanks lalop. I had to clean the output for unicode (archlinux is the distro I use, can't image it makes a difference indeed) I'll look into my settings.

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136

Problem solved more or less. I guess some interaction between mistakes in my corpus (three character strings in my digrams, and strange non-standard characters). Anyway, the resulting layouts finally score the best in Patorjk's keyboard analyzer, using the same Dutch + some English corpus.  Rolls, same hand etc. also have more credible values (4% outward, 9% inward rolls, >0% home jumps etc.) . Plus: a normal comma!

The keyboard is:
u.opy xclmv
aienh gdrts
;,?fq bkjzw

If I make home jumps more expensive, two alternative layouts pop up:

/  P  U  C  F   ;  M  S  L  H
A  I  E  N  K   G  T  D  R  O
<  >  Y  B  Q   Z  V  W  J  X

.  p  u  c  f   :  m  s  l  h
a  i  e  n  k   g  t  d  r  o
?  ,  y  b  q   z  v  w  j  x

Fitness:       114225
Distance:      89720
Finger work:   0
Inward rolls:  9.10%
Outward rolls: 2.21%
Same hand:     42.38%
Same finger:   1.42%
Row change:    14.73%
Home jump:     0.25%
Ring jump:     1.35%
To center:     4.92%

Time elapsed: 0 hours, 0 minutes, 8 seconds
Number of rounds in greatToBest() is now 512.
Chance to use previous layout is now 0.634732.
Number of swaps between rounds is now 14.
Number of rounds in greatToBest() is now 1024.

***Found from greatToBest()***

Hands: 53% 46%
Fingers: 9.0% 9.0% 13% 23% 0.00% 0.00% 16% 15% 7.0% 8.0%

/  B  O  P  Y   F  K  L  H  C
A  T  I  E  U   V  D  N  S  R
<  >  ;  J  Q   M  G  W  Z  X

.  b  o  p  y   f  k  l  h  c
a  t  i  e  u   v  d  n  s  r
?  :  ,  j  q   m  g  w  z  x

Fitness:       114080
Distance:      94310
Finger work:   975
Inward rolls:  6.67%
Outward rolls: 2.58%
Same hand:     33.25%
Same finger:   1.71%
Row change:    11.62%
Home jump:     0.29%
Ring jump:     0.62%
To center:     2.46%


Setting the home jump cost to normal, but honoring rolls more, this layout comes up:

J  K  U  W  ;   Q  V  H  C  /
A  D  E  N  S   G  T  I  R  O
<  Z  Y  M  X   B  L  P  F  >

j  k  u  w  ?   q  v  h  c  .
a  d  e  n  s   g  t  i  r  o
:  z  y  m  x   b  l  p  f  ,

Fitness:       94855
Distance:      90915
Finger work:   0
Inward rolls:  12.58%
Outward rolls: 3.40%
Same hand:     48.42%
Same finger:   1.75%
Row change:    17.42%
Home jump:     0.27%
Ring jump:     1.73%
To center:     5.38%

Not to derail this thread further: I do think that it can be worthwhile to optimize for specific languages !

Offline
  • 0
  • Reputation: 0
  • Registered: 04-Apr-2013
  • Posts: 538
pieter said:

three character strings in my digrams, and strange non-standard characters

Could you paste some of these (and better yet, minimal corpi that generate them).

The only such chars I can think of would be escaped ones like \n \t \\.

Offline
  • 0
  • Reputation: 210
  • From: Viken, Norway
  • Registered: 13-Dec-2006
  • Posts: 5,343

Optimizing for specific languages can be useful in some situations, but it's a tricky track if you want to make something general. Most people use some English anyway, but how much will vary greatly. Such optimisation may be best for creating individual layouts then?

Colemak seems to do well for the most used languages and/or combinations of these, and the decision to keep the H and C in place aids that somewhat (although the C could've used a better placement for many languages). I didn't analyze bigrams though.

In Norwegian 'kj' is not that uncommon but it doesnt' bother me. If I were writing a *lot* of Norwegian it might.

And yes, it's not quite right to be discussing Dutch layouts in a thread about Italian!

Last edited by DreymaR (10-Jul-2014 10:55:04)

*** Learn Colemak in 2–5 steps with Tarmak! ***
*** Check out my Big Bag of Keyboard Tricks for Win/Linux/TMK... ***

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136

Nice analysis of yours, Dreymar :-)  And yes, Colemak is a nice allround layout :-)

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136

@lalop - the problem was in LibreOffice, which put all sorts of characters in my corpus (tab, new line, new paragraph etc.).

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136

After many experiments I came up with this layout.

Hands: 56% 43%   Fingers: 9.0% 11% 21% 16% 0.00% 0.00% 14% 12% 10% 8.0%
:  U  O  P  Y   X  M  L  B  V
A  I  E  N  H   G  D  R  T  S
<  >  ?  K  Q   F  C  W  J  Z

,  u  o  p  y   x  m  l  b  v
a  i  e  n  h   g  d  r  t  s
;  .  /  k  q   f  c  w  j  z

Fitness: 71426   Distance: 59945
Finger work: 11  Inward rolls: 7.60%  Outward rolls: 2.75% 
Same hand: 35.53%  Same finger: 1.06% 
Row change:  11.11%  Home jump:  0.56%  Ring jump: 1.36%  To center:  4.25%

How did I design it?
After trial & error these steps worked well:
1. Find text to analyse: your own reports, code, email; books; websites;  in the language(s) you use. My input was 80% Dutch, 15% English and the rest was German, Spanish and some French.
2. Paste it into an editor. I used Medit, but many text editors will do.
3. Get rid of empty lines and line breaks (use the regex  \n  \t  and so on).  LibreOffice is lousy at this, I found out. An editor does it many times faster. Save it as a .txt file.  Encoding doesn't matter
4. Analyse the file using lalops Python program. Use -n1 for the outputfile allChars.txt and -n2 for the outputfile allDigraphs.txt
5. Clean the two output files. There will still be unwanted characters in the .txt files, from page-breaks, code from tables etc. I found that the humble nano editor is best for this (wow!). My builds are in folder Builds, so I type:  nano ~\Builds\Typing-master\Typing-master\data\allDigraphs.txt  and nano opens that textfile. Next I look for unwanted stuff, which is mostly stuff like \t\t and so on.  ctrl-W \  takes you to the next lines that contains a  \    ctrl-K removes that line.   (yes, a simple python progam could remove all these things as well, but I'm still learning python.... )
6. Make the layouts with mtgaps program (you have to build it first on your machine). Optionally: experiment with different values for inRoll, sameHand etc.
7. Check the layouts in patorjk's website. Feed it with chunks of text to see how your layout looks & performs.

How does it perform ?
According to patorjk's website:
- it scores the best for Dutch texts.It clearly beats Colemak, Dvorak, Balance 12 and so on.
- It beats Colemak in English, German, french, Italian and Spanish as well. Often it ranks the best, other times as second or third after Balance 12 or HIEAMTSRN. But always before Colemak and Dvorak.
- I tried some other variants, some more " rolling"  others more " alternating", which scored good as well, but not as good as aienhgdrts.

Conclusions
- Yes, Colemak is still a very good layout, and has all the advantages. it get picked up by the market and so on.
- There are some other layouts that are very good as well: Dvorak, Balance 12, the standard (English) MTGAP layouts
- if "qwerty similarity"  is an advantage, i don't know. In spoken languages I confuse languages that are too similar - I can't easiliy switch from French to Spanish, for instance. To German is easier.  Back to keyboards: I don't like one hand key combo's like ctrl-X and ctrl-v because it gives me a sore hand. If that is important to you, Colemak may be a good choice. But to me it isn't

- Making a dedicated layout is worthwile (on paper - I have to try it out in real life !) but is of course more of a hassle. On the other hand: if you are going to learn a new layout, why not go all the way?

Last edited by pieter (15-Jul-2014 13:46:37)
Offline
  • 0
  • Reputation: 0
  • Registered: 04-Apr-2013
  • Posts: 538
pieter said:

3. Get rid of empty lines and line breaks (use the regex  \n  \t  and so on).

pieter said:

Next I look for unwanted stuff, which is mostly stuff like \t\t and so on.

Is there a particular reason for this?  Those seem to be a legitimate part of the corpus, as far as I can tell.

pieter said:

5. Clean the two output files. There will still be unwanted characters in the .txt files, from page-breaks, code from tables etc.

Perhaps try this for .odt (I presume) to .txt conversion.

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136
lalop said:
pieter said:

3. Get rid of empty lines and line breaks (use the regex  \n  \t  and so on).

pieter said:

Next I look for unwanted stuff, which is mostly stuff like \t\t and so on.

Is there a particular reason for this?  Those seem to be a legitimate part of the corpus, as far as I can tell.

Yes, you are right: it's part of the corpus, But... MTGAP's program complained about escaped characters etc, it refused to run until I removed those. Furthermore, I didn't calculate the location of the tab, the return keys, the space, so I felt free to remove it.

BTW, my 'dream' program would be a combination of your Python analyser + MTGAPs layout calculator + patorjk's layout visualiser +  mikekuehn's "try keyboards in the browser".  Imagine: feeding the program raw text and seeing it return a layout, visualized, that you can tweak and immediately try in the browser.....  ?  Or even better: also have it generate a Windows PLK file and Linux XKB files..... :-)  'Why don't you make it yourself?' one might say... Good question! Well, because I can't program at all :-(  That's why I am grateful for the work of you guys who can and do that! :-)

Offline
  • 0
  • Reputation: 0
  • Registered: 04-Apr-2013
  • Posts: 538
pieter said:

MTGAP's program complained about escaped characters etc, it refused to run until I removed those.

MTGAP's own allChars.txt and allDigraphs.txt contain \n and \t, at least.  Perhaps there were other escaped chars?

Also, even if you fix their locations, \n and \t ought to (in the normative sense; I don't know if MTGAP actually does so for the small layouts) affect the results, since they affect your hands' location.  For example, typing "h\nh" (in QWERTY) is significantly harder than "hh".

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136

Update. I've done some more research, bigger corpus. I come up with a slightly different but better one:

Hands: 57% 42%  Fingers: 9.0% 10% 22% 16% 0.00% 0.00% 14% 12% 10% 8.0%    There is some imbalance left/right, especiallu the left middle finger has a lot to do. 

/  U  O  P  Y   X  C  L  B  V
A  I  E  N  H   M  D  R  T  S
;  <  >  K  Q   F  G  W  J  Z

.  u  o  p  y   x  c  l  b  v
a  i  e  n  h   m  d  r  t  s
:  ,  ?  k  q   f  g  w  j  z

Inward rolls:  7.54%; Outward rolls: 2.94%; Same hand:     36.39%;  Same finger:   1.46%; Row change:    12.27%
Home jump:     0.66%; Ring jump:     1.44%; To center:     3.94%

I call this one Juli16

Patorjk scores this one much better than the previous one.

The remarkable thing is: I compared in (in Patorjk) to the layouts Colemak, Dvorak and Balance 12. For various languages. These are the winners. 

For Italian: winner is Juli16
For German: winner is Juli16
For French:winner is  Juli16
For Dutch: winner is Juli16
For spanish: winner is Balance12,  # 2= Juli16 #3/#4 = Dvorak or Colemak
For English: winner is Balance12,  # 2= Juli16 #3/#4 = Dvorak or Colemak

So there you have it. for the record: all these layouts are very good; Colemak, Dvorak, Balance12 and Juli16.

Now it's time for live testing :-) Edit: I will also implement some other ideas. Such as:
- making the key of q type qu   Because in most languages q is always followed by an u.   
- always having a space after ,
- making the modifiers sticky
- always having space and Shift after .
- working with layers (thank Dreymar!). In the numerical layer, the above rules do NOT count, so I can type 1,23  or xq+z or 10.000

Last edited by pieter (16-Jul-2014 14:27:35)
Offline
  • 0
  • Reputation: 210
  • From: Viken, Norway
  • Registered: 13-Dec-2006
  • Posts: 5,343

Back to Italian!

I revamped the Colemak[eD] for Italian, as follows:

LBracket    è é [ {
RBracket    à ò ] }
LessGreater ù ì œ Œ

(çÇ on K, łŁ on L, ñÑ only on dead key)

This makes more sense to me than the previous version did: The àèù letters are far more common than éòì, and the others are only for special scripts (Venezian, Ligurian etc). Mnemonically, 'a' and 'o' are back vowels and 'u'/'i' are more frontal so they belong together. I feel it really came together now. :-)

Last edited by DreymaR (02-Jan-2015 13:06:05)

*** Learn Colemak in 2–5 steps with Tarmak! ***
*** Check out my Big Bag of Keyboard Tricks for Win/Linux/TMK... ***

Offline
  • 0