ON TIMING
Getting timing results from one's own typing is the first idea I tried. One can (reportedly) get sub-millisecond resolution using Java's System.nanoTime(), although I stuck to 1 ms resolution. Unfortunately, the results were useless.
Part of the problem is, seasoned typists will tend to be depressing the next key even before releasing the current key. For instance, when typing IN on QWERTY, I'm already most of the way done depressing the N key by the time I'm ready to release the I key. So, the true cost of pushing N is masked; it looks like the total time fingers spent flying around between I and N is, say, 10 ms, when in reality it's much more.
The only good way around this problem that I could think of was to measure only same-finger transition times, like between D and E on QWERTY. Sadly, this turned out to be useless as well, at least for the kind of layout I wanted to grow. Why? Well, here were my top 10 fastest transitions on QWERTY:
F -> R
S -> X (!)
F -> V
L -> O (!)
D -> C
K -> I
; -> P (!!!)
J -> U
D -> E
H -> U
So, what's wrong with that? Well, despite my being very right-dominant, somehow the top three fastest transitions are all on the left hand! On the right hand, the ring and pinky fingers rate faster than the index finger! (In fact, all fingers on the right hand rate faster than the index finger). This means that layouts resulting from this timing data will tend to be left-hand heavy (like QWERTY), and that the right index finger will probably end up massively under-used. In other words, the resultant layout would probably be worse than QWERTY! Why bother?
'Course, it's possible you'll get more intuitive data for yourself, but I wouldn't count on it.
ON BREEDING
It's not worth worrying too much about your breeding strategy. I've dabbled quite a bit in genetic algorithms (mostly with real-number "genes" rather than bit-based "genes") and read a fair number of journal articles on the subject; I've come to the conclusion that generally, adding complexity to one's breeding strategy doesn't help. With these kinds of algorithms, you generally want to get as close to random as possible while still converging on a "good" solution in a reasonable amount of time. See the "No Free Lunch Theorems" for more info.
Yes, life does better with sexual reproduction, but that's largely because most tweaks to DNA result in totally non-viable organisms--organisms that, in GA parlance, have the worst possible fitness value because they never even get the chance to live, let alone breed. Thus, you're a lot more likely to get a viable organism by mixing DNA from two known-viable organisms (crossover) than you are by randomly dinking with the DNA of a single organism (mutation). For the vast majority of GA applications, all "organisms" are viable (i.e., capable of competing and reproducing); they may not be good, but comparatively few are the worst.
ON LANGUAGES
Java is fine for these kinds of applications. Modern Java is fast, free, has a great free IDE (Eclipse), and has a massive library. This goes 10-fold if you're a new programmer--with C and C++, you're going to be spending the bulk of your time fighting with the language rather than your problem. C++, in particular, has an unholy amount of arcana, and a language standard at least as opaque as any law book.
ON USER INTERFACES
I really wouldn't worry about this at all. Tweaking the UI will easily steal all your time from improving your GA, and pretty much no one is going go use it, anyway. Of the tiny fraction of people who want to switch from QWERTY, an even tinier fraction of those people want to switch to a layout of their own design, and of that microscopic group of people, they all probably want to write their own software to do it, because they think they're smarter than everyone else. :)
Last edited by Phynnboi (18-Dec-2008 02:41:49)