Sure thing, but I think you might want to go back and reread the tutorial on simulated annealing:

"Usually multiple runs of the simulation are necessary to explore the family of solutions to identify the best one because the phase space of solutions is generally complicated and many local minima exist.

Simulated annealing is one of many types of stochastic optimization algorithms. Because simulated annealing has its roots in physics, the quantity that measures a solution's fitness is frequently refered to as the energy. The algorithm's role is to therefore find the solution for which the energy is minimum. The process of finding this solution begins with starting with some initial guess, which is frequently a random state.

For example, you would start with a keyboard layout on which the keys are randomly organized. Since each potential solution has an associated energy, you compute the energy (E0) and set this value aside. Next, you adjust the solution slightly to obtain a marginally different solution. The size of this adjustment is cruicial. If the step is too small, it make take forever to move to a significantly different state. If the step is too large, you risk jumping about the solution space so quickly that you will not converge to a solution. In the case of the keyboard, a reasonable step from one solution to another is a single or double keys swap. Once the new solution has been derived, you compute its energy (E1).

Now that you have E0 and E1 is where things get interesting. Let dE = E1 - E0 (I am using dE loosely here - it's not a differential quantity but just a difference). You might suppose that if dE < 0 then you should value the new solution over the old one, since its energy is lower. You'd be right - in this case, simulated annealing transitions to the new solution and continues from there. Let's skip over the dE = 0 case for now. What about when dE > 0? In this case, the new solution is less desirable than the old one, since its energy is higher. Do we discard the new solution? Let's consider what would happen if we did.

If all transitions with dE > 0 are rejected, the algorithm will be greedy and descend towards the lowest-energy solution in the vicinity of the current state. If the step from one solution to another is small, then a minimum energy state will be found. However, this may be a local minimum instead of a global one. If all paths from the local minimum to the global minimum involve an increase in energy (imagine the local minimum being a trough), these paths will never be explored. To mitigate this, and this is where the "stochastic" part of simulated annealing comes into play, the algorithm probabilistically accepts transitions associated with dE > 0 according to the following probability

where dE is the energy difference and T is a parameter that is interpreted as a temperature. The basis for this interpretation originates from the original motivation for the form of of the transition probability.

Given a candidate transition and dE > 0, the algorithm samples a uniformly distributed random number r in the range [0,1) and if r

The role of the parameter T is very important in the probability equation. This parameter is continuously adjusted during the simulation (typically monotonically decreased). The manner in which this parameter is adjusted is called the cooling schedule, again with the interpration of T as a temperature. Initially, T is large thereby allowing transitions with a large dE to be accepted. As the simulation progresses, T is made smaller, thereby reducing the size of excursion in to higher energy solutions. Towards the end of the simulation, T is very small and transitions with only very slight increases in E are allowed.

In carpalx, T is cooled exponentially as a function of the iteration, i.

The exponential form of T(i) has nothing to do with the exponential form of the transition probability. It's just a cooling schedule that I adapted, admittedly without testing any other form.

The effect of the cooling schedule is to allow the algorithm freedom of movement throughout the solution space at first (large T), then progressively converge to a solution with a minimum (medium T) and converge to a minimum (small T). The optimization simulation is typically run many times, each time with a different starting state.

parameter values

Choosing values for p0, T0, k and N should be done carefully. The possible range of values for E must be anticipated and T will need to be scaled appropriately to produce useful probability values.

Figure 1 | Cooling schedule for the optimization. As the optimization runs, the probability of accepting a less desirable layout exponentially decreases.

For carpalx, the typing effort ranges from 9 (QWERTY, mod_01) to about 4.5-5 (optimized, mod_01). Consequently, I have chosen the parameters as follows

T0 = 10

k = 10

T0 scales the temperature so that at maximum energy, E/T is approximately 1. With a value of k=10, the temperature drops by 1/e for every 1/10th of the simulation. Initially, the temperature is T(i=0) = 10. At i=1000, (1/10th of the simulation), T(1000)=10/e. At i=2000, (2/10th of the simulation), T(2000)=10/e2, and so on. At the end of the simulation, T(end) = 10/e10 = 4.5x10-4. If you want the temperature to drop faster, use a larger value of k. To start with a cooler system, set a lower T0.

For all simulations, I've used

p0 = 1

N = 10000

"

tl;dr: There is a single corpus of input, against which he chooses a random starting configuration for a keyboard. Each step introduces a random keyswitch which is evaluated for goodness. If it's better, it's kept. If not, there is a probability it will be accepted anyway, a probability which decreases as each step is taken. Finally, the algorithm converges on a minimum that is best for the area within a monotonic slope from that point.

Because the problem space is non-monotonic, the given result for a run cannot be proven to be optimum across the problem space. Because the problem space is np-hard, it cannot be solved in polynomial time (ie, brute-forced) so your result is only an heuristically chosen optimum and cannot be proven so, only disproven.