Viewing posts for the category Omarine User's Manual
Machine learning techniques may sound superior, but its application is very practical in life. So big tech corporations like IBM and Google have invested a lot of money and used their cloud to trade in machine learning services. Applications can be at national level such as wildfire disaster analysis, weather forecast; average as supporting decision making; small as predicting housing prices and income.
We use FPP for a simple and familiar problem, enough to illustrate how to use it. That is the Tennis problem.
1) Extracting fpp-4.0.2.tar.xz
After downloading fpp-4.0.2.tar.xz, you create a folder called fpp, place the file there, change to the directory and extract the file as follows:
We do experiments to compare TensorFlow - A popular machine learning program that trains networks based on loss functions, and FPP - Network training based on genetic algorithms.
In inequality
d (f (P_{u}), f (P_{v})) ≤ qd (P_{v}, P_{u})
the coefficient q is called the Lipschitz constant. It determines the convergence speed of the contraction mapping
The question of the number of steps needed to train a neural network is answered by the convergence of genetic algorithms according to the principle of contraction mapping. Having the number of training steps does not require a hyperparametric optimization method in the constraint domain, but it is formed with arbitrary values according to each problem. It is similar to unsupervised learning, there is no monitoring or adjustment process for the number of steps to train, because it exists only as a consequence, when genetic algorithms converge. That's when the network population returns to itself after a generation. This point is called the fixed point.
Metric space
During evolution, the algorithm creates a sequence of network populations. Call P_{t} as a population in the t-generation. The initial population will be P_{0}. We define a metric space as follows:
Let S be a set of populations. A metric d is defined:
d (P_{u}, P_{v}) = | E_{u} - E_{v} |
In which E_{u} and E_{v} are the average errors of the populations at the u and v generations, ie P_{u} and P_{v}.
We design the population to always improve after each generation, ie E_{v} <E_{u} ∀ P_{u}, P_{v} ∈ S | u <v.
(S, d) is a metric space due to ∀ P_{u}, P_{v}, P_{t} ∈ S:
d (P_{u}, P_{v}) ≥ 0
d (P_{u}, P_{v}) = 0 ⇔ P_{u} = P_{v}
d (P_{u}, P_{v}) = d (P_{v}, P_{u})
d (P_{u}, P_{v}) ≤ d (P_{u}, P_{t}) + d (P_{t}, P_{v})
For the fourth characteristic, without losing generality we assume u ≤ v. There are three situations:
If t ≤ u
d (P_{u}, P_{v}) = E_{u} - E_{v}
d (P_{t}, P_{v}) = E_{t} - E_{v}
Since t ≤ u so E_{t} ≥ E_{u}, therefore E_{t} - E_{v} ≥ E_{u} - E_{v}. Ie d (P_{u}, P_{v}) ≤ d (P_{t}, P_{v}). Because d (P_{u}, P_{t}) ≥ 0 so d (P_{u}, P_{v}) ≤ d (P_{u}, P_{t}) + d (P_{t}, P_{v}).
It is similar to t ≥ v.
If u <t <v,
d (P_{u}, P_{t}) = E_{u} - E_{t}
d (P_{t}, P_{v}) = E_{t} - E_{v}
Therefore
d (P_{u}, P_{t}) + d (P_{t}, P_{v}) = E_{u} - E_{t} + E_{t} - E_{v} = E_{u} - E_{v} = d (P_{u}, P_{v})
The population sequence {P_{t}} corresponds 1 - 1 with the average error sequence {E_{t}}. The sequence {E_{t}} is a sequence of real numbers that is monotonically decreasing and bounded from below (by 0) so it converges in ℝ. Correspondingly, the sequence {P_{t}} converges in S.
So the metric space (S, d) is a complete metric space.
Note that the above converged population sequence is just an example that illustrates the completion of space. It iterates infinitely to a local minima. We need another convergent sequence, "much shorter" with the contraction nature of the mapping, and go to the only fixed point, which is the global optimization.
Contraction mapping
Convergence occurs at the end of evolution. At this point the optimal scheme has formed and the algorithm preserves the scheme, the genetic operations make the population change less. So we can easily ensure that the population's evolution is slowed down.
For generations u <v, we have
E_{v} - E_{v+1} <E_{u} - E_{u+1}
or
E_{u+1} - E_{v+1} <E_{u} - E_{v}
Consider the mapping f: S → S | f (P_{t}) = P_{t+1}. This is evolutionary mapping. It impacts on the population of t-generation and produces the t + 1 generation population.
d (f (P_{u}), f (P_{v})) = d (P_{u+1}, P_{v+1}) = E_{u+1} - E_{v+1} <E_{u} - E_{v} = d (P_{u}, P_{v})
We can also set a coefficient q ∈ [0, 1) such that ∀ P_{u}, P_{v}, ∈ S
d (f (P_{u}), f (P_{v})) ≤ qd (P_{u}, P_{v})
So f is a contraction mapping.
The algorithm will converge to the fixed point. We just need to check if
f (P_{t*-1}) = P_{t*} = P_{t*-1}
or
E_{t*} - E_{t*-1} = 0
then stop.
What is interesting is that the time of stopping the algorithm determines the number of evolutionary generations that are very different for different problems.. They only share a common feature of convergence. The traditional optimization method will be difficult when the parameter is in a wide range. The superiority of genetic algorithms is there
Can't see mail in Inbox? Check your Spam folder.