• You are not logged in.

Colemak and italian

  • Started by rebus
  • 36 Replies:
  • Reputation: 0
  • From: Italy
  • Registered: 04-Apr-2014
  • Posts: 14

Hello,

I am willing to learn Colemak after wrist pain problems. I am a native italian typist, I haven't found many experiences with Italian language among Colemak users, except for spremino which found Dvorak layout better suited.

His experience really has made me value learning Dvorak instead, but:
- I write mainly Italian, but sometimes English and occasionally also other languages;
- It seems to me that Colemak is anyway better than QWERTY even for Italian language.

So at this time I am still considering learning Colemak. If I manage, I will be happy to share my own experience.

Just a couple of questions more: does any of you have other experiences with Italian? How easy is to type accented latin letters (not only Italian ones)?

Offline
  • 0
  • Reputation: 4
  • Registered: 08-Dec-2010
  • Posts: 656

No matter which language, ARSTDHNEIO is in top ten or top fifteen of the most typed keys I suppose, given plus or minus 0.5%.

So with different languages you get different level of efficiency, but it must be over 95% I suppose. Frequent of bigrams may be another matter however.

Since most people now use English a lot each day, Colemak can be universal.

Last edited by Tony_VN (06-Apr-2014 14:34:24)
Offline
  • 0
  • Reputation: 210
  • From: Viken, Norway
  • Registered: 13-Dec-2006
  • Posts: 5,343

What do you feel about my attempt at an Italian Colemak?

I show it with a Wide mod in those images, so the keys in the middle are to the right on most keyboards but you should get the idea.

My Italian Colemak[eD] gives you the accented letters directly which is a simple solution. Programmers can switch between a standard Colemak and that to get the brackets easily accessible when they need them more, keeping the rest of the layout identical so it's an easy switch.

In that topic, I also refer to a discussion we had about other languages. Yes, Colemak is good for latin languages at least. And as you say, people also write some English so you need to have a layout that lets you do that easily too.

Last edited by DreymaR (07-Apr-2014 08:47:33)

*** Learn Colemak in 2–5 steps with Tarmak! ***
*** Check out my Big Bag of Keyboard Tricks for Win/Linux/TMK... ***

Offline
  • 0
  • Reputation: 0
  • From: Italy
  • Registered: 04-Apr-2014
  • Posts: 14
DreymaR said:

What do you feel about my attempt at an Italian Colemak?

Great job! Thanks.

I think the issues spremino claimed were more about comforts (link) than about special letters, anyway I have not enough experience on Colemak to value (I've just studied the home row so far), so I guess I will learn basic Colemak first and then see how I feel about accented letters.

Just a few considerations:

It seems weird to me that the letter è proves to be less frequent than rare letters like Q or Z. What I think is that accented letters are probably used in a few words, but these words are often very common (like "è", or "perché"), so they are actually more used than what it seems from the study linked.

Capitalized accented, instead, with the exception of È, are not used at all in Italian (except you are writing everything uppercase, of course); but being È not directly accessible from standard keyboards, Italians often write E' instead (though not correct). So, the decision not to include capitals is right, provided there is some chance to produce them (with dead keys or someway).

è is for sure the most used accented in Italian, so the chance to type it without the use of dead keys is nice. Maybe it would be even nicer to have it in the home row (but to be honest, this does not happen neither in standard italian QWERTYs).

Anyway, your layout seems very useful, but I will be more into it after I learn to really master Colemak :)

Offline
  • 0
  • Reputation: 210
  • From: Viken, Norway
  • Registered: 13-Dec-2006
  • Posts: 5,343

No, the letter frequencies are measured on texts so they don't care whether it's a letter used in many words or in frequent words. Sometimes, these frequencies are a little unintuitive I find. But if the corpus is big and representative enough they should be trustworthy.

In my [eD] 'unified symbols' layouts, you can always access all accents by dead keys. So for an é you'll type the simplest way but for É you'll have to type AltGr+' then e. That's still easy enough to do for that rare letter. I took care to make the dead keys accessible for the most used accents.

I chose to leave the èé key exactly as it is on the standard Italian keyboard. Like with Colemak, there's no point in changing what's already good (enough)!

If you need this layout for Windows, I recommend PKL. You can download it from my sig topic and if you need I'll help you make the Italian Colemak[eD] for it. If you make it yourself, please send it to me so I can include it in my download. :)

Last edited by DreymaR (07-Apr-2014 14:50:24)

*** Learn Colemak in 2–5 steps with Tarmak! ***
*** Check out my Big Bag of Keyboard Tricks for Win/Linux/TMK... ***

Offline
  • 0
  • Reputation: 0
  • From: Italy
  • Registered: 04-Apr-2014
  • Posts: 14

I'll give it a try and let you know. Thanks! I am planning to buy an ergonomic keyboard in the near future, but I guess it will be really useful until then!

[Edit] I downloaded the PKL file, but I'm a little confused about how to make changes. May you help please?

Last edited by rebus (07-Apr-2014 15:17:54)
Offline
  • 0
  • Reputation: 0
  • Registered: 03-Jul-2009
  • Posts: 189
rebus said:

I haven't found many experiences with Italian language among Colemak users, except for spremino which found Dvorak layout better suited.

Hello!  Sorry for being late to the party ;)

Yes, I found Dvorak better suited for Italian because:

1. vowels are on the opposite hand of AltGr: this allows to add ergonomic and mnemonic AltGr combinations for accented vowels;
2. the only awkward sequences are PE and EP;
3. the apostrophe doesn't require shifting your hand.

Colemak didn't offer clear advantages, while:

1. L and C are an awkward reach for me;
2. FR, RF, RC, CR, SP and PS are awkward sequences for me;
3. vowels are on both hands: see above;
4. typing L and then apostrophe is awkward: I would rather have the apostrophe where the semicolon is.

DreymaR, your "worldwide" effort is commendable.  You are a layout geek ;-)  Your proposal for an Italian Colemak is interesting.  Unfortunately, I lack the time to evaluate it thoroughly but I will tell how I feel about it.

The problem with accented letters in Italian is that it is true that they are not frequent, but awkward reaches within text do break the flow nonetheless, in my experience (I deem awkward reaches to be worse than same-finger sequences).  The placement of *è*, *é* and *ù* could be better. Your suggestion to use the Spanish layout could be a solution, but then the placement of grave accent and apostrophe could be bettered if such layout were to be used for Italian.

It would be useful if you could highlight dead keys in your layouts.  Anyway, thank you for your valuable contribution.

Cheers.

Last edited by spremino (17-Jun-2014 11:40:31)

Dvorak typist here.  Please take my comments with a grain of salt.

Offline
  • 0
  • Reputation: 0
  • From: Italy
  • Registered: 04-Apr-2014
  • Posts: 14
spremino said:

1. vowels are on the opposite hand of AltGr: this allows to add ergonomic and mnemonic AltGr combinations for accented vowels;

Good call. I recently switched to US layout and use dead keys for most accented letters (not a perfect solution but very versatile, and also I don't really like the position of accented letters in the Italian layout) but your point is a plus indeed.

spremino said:

typing L and then apostrophe is awkward

I agree about this as well!

The point is I don't know if I am willing to learn another layout after Colemak :D But maybe I will give Dvorak a chance in the future.

Offline
  • 0
  • Reputation: 0
  • Registered: 03-Jul-2009
  • Posts: 189
rebus said:
spremino said:

1. vowels are on the opposite hand of AltGr: this allows to add ergonomic and mnemonic AltGr combinations for accented vowels;

Good call. I recently switched to US layout and use dead keys for most accented letters (not a perfect solution but very versatile, and also I don't really like the position of accented letters in the Italian layout) but your point is a plus indeed.

An inferior solution on Colemak would be to swap Alt and AltGr and thus have all the vowels except "A" on the opposite hand of AltGr. You could enter "À" by AltGr + A or choose to put "À" somewhere on the right hand ("N" will be a problem if one day you will learn Spanish and need "Ñ").

rebus said:
spremino said:

typing L and then apostrophe is awkward

I agree about this as well!

The point is I don't know if I am willing to learn another layout after Colemak :D But maybe I will give Dvorak a chance in the future.

I am not trying to claw you back into the Dvorak camp, but my experience is that after having learned an alternative layout, learning another one is much less effort.

Dvorak typist here.  Please take my comments with a grain of salt.

Offline
  • 0
  • Reputation: 210
  • From: Viken, Norway
  • Registered: 13-Dec-2006
  • Posts: 5,343

Bah, I wouldn't really recommend switching from Dvorak to Colemak or vice versa even though I did it. Both are good enough. Colemak is just sexier and if you're turned on by your layout then you're a sick frak and ... like me... :D

Problems like these can be solved independently in the layout you're using. The L-' bigram may be as simple as a bit of finger training which your fingers will like anyway and it's a *lot* simpler than a layout change! I don't think I mind that bigram but that may be my piano fingers talking. The AltGr issue... well, I have less of a problem with AltGr chords since I use a Wide ergo mod and if you're into that it shouldn't be an issue – and I prefer dead keys and/or special locale keys anyway. Using up five AltGr mappings for a single accent seems like such a waste and isn't really friendly anyway, is my feeling. ;)

Last edited by DreymaR (20-Jun-2014 13:01:37)

*** Learn Colemak in 2–5 steps with Tarmak! ***
*** Check out my Big Bag of Keyboard Tricks for Win/Linux/TMK... ***

Offline
  • 0
  • Reputation: 0
  • Registered: 03-Jul-2009
  • Posts: 189
DreymaR said:

Bah, I wouldn't really recommend switching from Dvorak to Colemak or vice versa even though I did it. Both are good enough.

For English, sure they are, but for Italian?

I wouldn't have recommended switching if the original poster had been using Colemak for longer than a couple of months.  If the Dvorak is better for Italian, then switching could still make sense.

Last edited by spremino (20-Jun-2014 14:00:21)

Dvorak typist here.  Please take my comments with a grain of salt.

Offline
  • 0
  • Reputation: 0
  • From: Italy
  • Registered: 04-Apr-2014
  • Posts: 14

Damn spremino, you're really making me think about giving Dvorak a chance :D

Offline
  • 0
  • Reputation: 210
  • From: Viken, Norway
  • Registered: 13-Dec-2006
  • Posts: 5,343

I see your point. It'll still depend on your English usage: How much English you type and what kind. If you're using Linux/Unix then Dvorak has disadvantages there as you know. If you're switching computers a lot so you have to type some QWERTY, likely the same (ymmv).

If you type a lot of Italian and feel the problems Spremino lined out then maybe Dvorak is better for you indeed. The only thing that really worried me there was the bigrams. If those bigrams are common enough in Italian that's worrisome.

*** Learn Colemak in 2–5 steps with Tarmak! ***
*** Check out my Big Bag of Keyboard Tricks for Win/Linux/TMK... ***

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136
DreymaR said:

Yes, Colemak is good for latin languages at least.

  Yes, but not as good as for English since cmk is optimized for English. And yes, the top 10, top 15 letters are the same in many indoeuropean languages enotadrilgh are roughly the top 10 letters

My language is not Italian but Dutch. Dutch, German and English are linguisitically close to eachother. Italian is a bit further from English

Yet, spelling rules and words may differ a lot. For instance, the English word “the” is in Dutch “de”. The keyboard roll  “th” is in Dutch not very useful.
On letter-level, some of the more extreme differences are:
Z: English 0,1% German 1,1% Dutch 1,4%
K: English 0,8% German 1,4% Dutch 2,2%
C: English 3,5% German 2,7% Dutch 1,2%
Q English 0,1% German 0,02% Dutch 0,01%

What this means is that QWERTY is bad in English, and even worse in Dutch. It alsow means that Colemak, just like MTGAP, Dvorak, Carpalx, Asset, Klausler etc. are all large improvements over Querty, but being optimized for English they score lower in Dutch.

So my guess is that yes, Colemak IS better than Qwerty in Italian, but it will be less optimal. 

Using mtgap's algorithm (https://github.com/michaeldickens/Typing)  I found very different keyboards. There seems to be some bugs in the program, so take the results for what they are (comma in the shifted layer?? must be a mistake!) :

J S C H , ; L K G B
A D E N P V R O I T
U X F Q

j s c h / ? l k g b
a d e n p v r o i t
u x : z y w m . f q

Fitness: 3905
Distance: 4590
Finger work: 0
Inward rolls: 19.12%
Outward rolls: 3.92%
Same hand: 45.10%
Same finger: 0.00%
Row change: 7.84%
Home jump: 0.00%
Ring jump: 0.00%
To center: 0.98%

You see that it's extremely rolly. See the nice inward rolls on the right hand:  s c h (like in school, this trigram is much used in Dutch,    e n  (meaning andand) de (which means the


DreymaR said:

And as you say, people also write some English so you need to have a layout that lets you do that easily too.

It is of course the question what is a better optimization:
- and English optimized keyboard for use in English, which is OK for other languages
- a keyboard that is optimized for a mix of languages (but 'master of none')
- or a keyboard that is optimized for the most used language (and OK for others)

I compare this to choosing the right clothes for weekend to a Mediterranean beach with a visit to the snowy Alp mountains. Do you pick the summer outfit?  Warm ski clothing? Or something in between (and be too hot on the beach but still cold on the mountain) ?   The is no single "optimum"

My compromise is to optimize for the most used (90%) language, Dutch in my case. Meaning that 90% of the time I type on the (mathematically) best layout, and 10% is on a suboptimal layout. The Dutch layout would still be good for English and German (and even for French and Spanish). See above.

The other solution would be the "mixed text corpus optimation'. In which I optimize for mostly Dutch, with some English, and bits of German, French and Spanish thrown in.

Here is a layouts for a mixed Dutch/ English corpus (sorry, no metrics) :
.gscwhkv
oadeu ltnir
xjyfq bmp:z

The nice thing is that digrams like kl and bl (more used in Dutch) do not have home row jumps. de (meaning the, so much used) is a nice roll on the left hand. oaeu are (dvorak-ish) on the left hand. I don't know where the comma went ?? Must be a bug, like I said.

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136
DreymaR said:

If you're using Linux/Unix then Dvorak has disadvantages there as you know.

You mean ls  I suppose? You do know aliases, do you? ;) :

Last edited by pieter (04-Jul-2014 12:08:20)
Offline
  • 0
  • Reputation: 210
  • From: Viken, Norway
  • Registered: 13-Dec-2006
  • Posts: 5,343

Having to implement an alias every time I come to a new computer is a disadvantage to me.

The issue should be relatively frequent bigrams, not already rare frequencies. The mtgap model should take care of this. If you do type more than 90% Dutch then I guess a layout optimized for that could be a benefit to you as long as you're willing to go your own ways. I type so much English that a Norwegian layout would be little more than a pain in the butt for me – beyond providing the special symbols of course.

*** Learn Colemak in 2–5 steps with Tarmak! ***
*** Check out my Big Bag of Keyboard Tricks for Win/Linux/TMK... ***

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136

@Dreymar - of course you are right! And I am willing to find stuff out myself, hope you don't mind me "borrowing" things from your Big Bags of Tricks - of course with all due atrributions and feeding back my own tricks!  :-)  :-)

Offline
  • 0
  • Reputation: 210
  • From: Viken, Norway
  • Registered: 13-Dec-2006
  • Posts: 5,343

Attributions are nice, but the Big Bag is a public domain idea bank so feel free to take any ideas from it!

*** Learn Colemak in 2–5 steps with Tarmak! ***
*** Check out my Big Bag of Keyboard Tricks for Win/Linux/TMK... ***

Offline
  • 0
  • Reputation: 0
  • Registered: 04-Apr-2013
  • Posts: 538
pieter said:

Using mtgap's algorithm (https://github.com/michaeldickens/Typing)  I found very different keyboards. There seems to be some bugs in the program, so take the results for what they are (comma in the shifted layer?? must be a mistake!)

Where do you find Dutch data for MTGAP (or better yet, data for multiple languages)?  I've been trying to find good non-English data for a long time now!

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136

@lalop - it was a lot of work. I copied and paste text of my own job, and of weblogs, journals etc. Pasted them into an editor, filtered out empty lines, and fed the text into this analizer: http://andong.azurewebsites.net/dvorak/ The output is text. I copy it into a spreadsheet (Libre Calc, in my case), sort it, and trim the figures so that useless info disappears. Then I make 2 text files, one with single character frequencies and one with digram frequencies. Those txt files are input for the mtgap program.

You can see clearly I am a non-programmer. My plan is to learn sufficient Python to write a program for it. Calculating freqs can't be a hard problem, right? An alternative is the carpalx program (written in perl) which does both the frequency analysis and the keyboard layout; you can feed the program with text files. The mtgap program must be fed with frequency files (in txt format).  I haven't got around to installing & using carpalx.

If you are interested i can post my Dutch txt files. I also made some " mixed language" frequencies, based on German, Dutch and English. This gives the oadeu ltinr keyboard.  There are bugs in mtgap however, so I don't trust it completely.

The algorithms have much influence. When I build mtgap's ideal 'dutch' keyboard in the patorjk site, and feed it with dutch prose, it scores worse than colemak, dvorak and several other layouts. I guess mtgap values other things than patorjk......

PS, I remember seeing public freq data for several languages, such as French, German, Spanish... I'll pst when I find them again.

Last edited by pieter (04-Jul-2014 21:41:24)
Offline
  • 0
  • Reputation: 0
  • Registered: 04-Apr-2013
  • Posts: 538

Given a corpus (say, as a .txt file), it's essentially just a counting problem.  Here's an initial skeleton:

https://github.com/lalopmak/ngrams/blob … 2ngrams.py

Sample command:

python3 text2ngrams.py -n 2 -o output_file.txt corpus1.txt corpus2.txt corpus3.txt

where -n 2 indicates that you want 2-grams, and -o output_file.txt indicates to write results to output_file.txt. 

The hard part is the output format.  With the current version, unicode chars are being displayed like: \x80 (edit: in python3, unicode is represented as expected for me).  Not sure how that's supposed to go.  (This is controlled by the ngram_repr function, in case anyone wants to take a shot at it.)

Finally, text_replacements (for replacing, e.g. “ ” with the same character ") is probably incomplete, and I don't know of any general solution that doesn't also replace non-English characters such as ç é ä.



I wouldn't mind seeing your corpus files.  You can also host them on, say, github.

I also find the generated layouts very strange.  That roll rate is very high (the ones in his post tend to be at most 10%) and are / and : supposed to be more common than comma?  Also, does MTGAP take into account the non-English characters (if not, I'm not sure how useful it is for this purpose).

Last edited by lalop (17-Jul-2014 16:09:16)
Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136

lalop, thanks. Due to real life stuff I won't have time this week, but will analyze further next week. And I'll post my corpi :-)  thanks for the python suggestions.

If you play WordFeud in various languages, you'll see the difference in letter frequencies! I play in Dutch and (very occasionally) in Spanish - Spanish has way more vowels in the game. Dutch, German, English, Norwegian etc. words and syllables typically consist of more consonants than vowels. Languages like German and Dutch may have lots of consonants: "autumn storm" is in Dutch "herfststorm" and in German "herbststorm" You won't find these "consonant clusters" in Spanish or Italian or Portuguese.....  Slavonic languages on the other hand also very (maybe even more?) consonant rich - think of Slovenian Trgovina (shop), the Croat mediterrenean island Krk, the Czech city Staroměstská Radnice ...

The mtgap layouts that I calculated are strange indeed!

Offline
  • 0
  • Reputation: 0
  • Registered: 04-Apr-2013
  • Posts: 538
pieter said:

lalop, are we talking python2 or python3?  Possible syntax errors in lines 42 and 78....

Should now be compatible with both.

In fact, python3 seems to represent the unicode characters by default, so I might just be using it from now on.

Offline
  • 0
  • Reputation: 2
  • Registered: 25-Oct-2013
  • Posts: 136

@lalop, thanks. I modified the code a bit to work under python3 - print is now a function so needs (), and I inserted a line
from functools import reduce   because python3 doesn't do reduce....   (I mean, I modified it on my own PC.... )

Then it worked - in my case the unicodes did NOT work - but I'm stupid, might be in my computer (must still set my locales.... )

It works FAST !! Great work !  I cleaned the output, then fed it into mtgap. And that program gives me basically still adenpgitro keyboards (variants). When I set the penalties for same hand higher, to force it into alternation, it gives me variants of this keyboard:

,bscy :kvmp
oadeu ltnir
<>xjq wghzf

This is what i got also with standard penalties (for a mixed english/dutch corpus)  It's Dvorakish. The nice thing is the very low home row jumps (in Dutch).

I must try them out. I will get the 'try keyboards in the browser' from GH and get these layouts in to try them out.

Last edited by pieter (08-Jul-2014 21:35:15)
Offline
  • 0
  • Reputation: 0
  • Registered: 04-Apr-2013
  • Posts: 538

I've just made a fix to convert some same-key chars (e.g. “ ” into " - see the list corpus_replacements for what is currently replaced).  Not sure if MTGAP already treats such chars as originating from the same key or not.

How did you have to clean the output?  Was it only the unicodes? I'm not sure why those wouldn't work for your system (I'm running Ubuntu, if that makes any difference).

Last edited by lalop (08-Jul-2014 23:19:11)
Offline
  • 0