Introduction - Defining the criteria of heuristic and meta-heuristic behaviour:
Heuristic algorithms typically intend to find a good
solution to an optimization problem by ‘trial-and-error’ in a reasonable amount
of computing time. Here ‘heuristic’ means to ‘find’ or ‘search’ by trials and
tallying the hits and misses. There is no guarantee to find the best or optimal
solution, though it might be a better or improved solution than an educated
guess.
Any reasonably good solution, often suboptimal or near
optimal, would be good enough for such problems. Broadly speaking, local search
methods are heuristic methods because their parameter search is focused on the
local variations, and the optimal or best solution can be well outside this
local region. However, a high-quality feasible solution in the local region of
interest is usually accepted as a good solution in many optimization problems
in practice if time is the major constraint.
Metaheuristic algorithms are higher-level heuristic
algorithms. Here, ‘meta-’ means ‘higher-level’ or ‘beyond’, so metaheuristic
means literally to find the solution, based on a "endowment", often based on a power law, that creates the appearance of learned behavior. Moreover, like heuristic algorithms, metaheuristic algorithms also include randomization, trial-and-error, process elements aswell. The endowment and randomization behaviour together creates behaviour in the metaheuristic which is greater than the sum of the individual parts, making for non-programmed emergent behaviours.
Broadly speaking,
metaheuristics are considered as higher-level techniques or strategies which
intend to combine lower-level techniques and tactics for exploration and
exploitation of the huge space for parameter search.
This kind of search can be represented in this way, for example you imagine that you have a particle which travels along a search space landscape function f(x) which contain a shallow hole and a deep hole.
The deep holes can represent the search space global optimum (i.e. best solution possible for f(x)) and the shallow hole is an approximations of the solution, i.e. a varying local optimum solution of f(x).
The objective of the search procedure across this space should therefore be to get to the particle to the bottom of the global optimum and furthermore stay there.
If you were to simply exploit an instruction to move a particular direction downhill, depending on where the exploited procedure assumes the best direction lies to obtain the best solution, then it is likely that the particle will fall into a local optima rather than a global optima and will furthermore stay there as the procedure would be designed to find the global optimum by travelling downhill and therefore would not be able to go uphill.
However if you introduce randomization - which could be noise, heat i.e. if you introduced "simulated annealing" so that there is a random motion superimposed on the exploited procedure searching for a hole to go down then with the "right" amount of randomness or simulated annealing, then you can effectively make sure the particle can escape the shallow local optima but cannot escape the deep global optima.
The "right" amount of randomness introduced to the exploited procedure is determined by some kind of predetermined or learned tuning of randomness introduced to the system so that this global optimization can happen if the search is allowed to run over a long enough period of time. Running the metaheuristic procedure long enough then will allow a global optimum to eventually be found. This is in essence what a metaheuristic algorithm is all about.
Meta-Heuristics and Universal Power
Law-Based Signalling Algorithm:
The most basic meta-heuristic signalling algorithm we can
construct is one based on some sort of power law for signalling.
Light signalling can of course be represented by a power
law. Light signal transmission, like light in general, obeys the inverse-square law in its propagation
through space. That is to say, the intensity
of the light, I, is inversely proportional to the square of the distance from the light
source.
Therefore the light signal gets weaker and weaker as the
distance increases.
The meta-heuristic algorithm therefore must contain a
function which monotonically decreases its signalling power between each
transceiver node with distance, r, under a discrete light absorption coefficient for the physical medium g.
Under a given physical medium, the Gaussian form of the
light intensity is then determined by:
The universal power
law signalling function of is
then written to be monotonically decreasing:
Where r is the distance between the different nodes.
m is the power law parameter for the signal, for light
signals it is m=2.(for signals that use audio, chemical, etc, power laws it
will be different)
Where β0 is the emitted signal at r=0
The distance between any two nodes, i and j at Xi and Xj respectively, is the Cartesian distance as follows:
Where Xi,j is the (k)th component of the spatial coordinate Xi of the (i)th
node.
Where d is the
number of dimensions.
The time evolution of each individual node, i, over its signalling
cycle is the governed by the fact each typical node, i, is coupled to the most intense, i.e. brightest, node it sees, j, by the following equation:
The
second term is the signalling term, for most cases in implementation β0 = 1. This is the physical endowment, the power law, exploitation process in the metaheuristic.
The third term is the term that defines the
randomization of the possible propagation path’s taken which uses a
randomization parameter, a, which
for most cases in implementation is distributed in the domain of [0,1]. This is the heuristic, "trial and error", exploration process in the metaheuristic.
Interestingly, we can write this equation symmetrically as:
Which when presented in terms of node distance reads:
We can then define the signalling Activation Function, A:
Which reduces our equation to a more fundamental signal field representation:
Based on these formulae, we have the corresponding rules:
·
Each node in the field will emit and absorb
discrete light signals equally.
·
The signals are proportional to the intensity
of the light emitted, which both decrease in proportion to increasing distance
between the nodes.
·
For any 2 flashing nodes in the field, the
signal absorbed with the highest intensity will induce the strongest coupling
·
If there is no signal detected with higher
intensity than any one particular node, it will designate itself as being
isolated and signal randomly
·
The light insanity of any signalling node is
determined by the landscape of the field, itself determined by the nature of
the activation function itself.
Observing Metaheuristic Exploration VS Exploitation in a Search Space:
From the rules derived from the physical parameters used
to construct the pre-programmed signalling mechanism we can define that the metaheuristic
nature of the signalling algorithm is based the behaviours of exploration
and exploitation
Exploration in the metaheuristic algorithm is achieved,
in the case of the above signalling algorithm, by the use of randomization
which enables the algorithm to have the ability to jump out of any local state
of coupling between 2 or more signalling
nodes so as to explore the search for more couplings in the network over a
potentially larger area.
Randomization is also used for local search around the
current state if the signalling are limited to a local region. When the signalling
frequency is long, randomization then allows for exploration of the the search
space on a larger scale. Fine-tuning the right amount of randomness and
balancing local search and global search are crucially important in controlling
the performance of the synchronizing metaheuristic algorithm.
Exploitation is the use of local knowledge of the search
and solutions found so far so that new search moves can concentrate on the
local regions or neighbourhood where the optimality may be close; however, this
local optimum may not be the global optimality. Exploitation tends to use
strong local information such as gradients, the shape of the mode such as
convexity, and the history of the search process.
Observations, using simulations, of the convergence
behaviour of common optimisation algorithms suggests that exploitation tends to
increase the speed of convergence, while exploration tends to decrease the
convergence rate of the algorithm. On the other hand, too much exploration
increases the probability of finding the global optimum, while strong
exploitation tends to make the algorithm being trapped in a local optimum.
The relationship between exploration and exploitation
both can be seen with a simple implementation of the metaheuristic derived
above in the context of sensory exploration and contrastive learning exploitation
based on the activation signalling function
The sample code is implemented as:
for
k=1:N % Generate sensed
signals, given that the initial values are instincts to exploit
z(k,1)=sigmaz2(k,1)*(0.5-randn); %
Generate internal signalling system based on activation function A(r)
y(k,1)=x(k,1)+z(k,1); %
Generate input – i.e. an external signal that is sensed - exploration
Y=[y(k,1); Y(1:(n-1),1)]; %
Learn by contrast – i.e. shift regression vector and load in new value -
exploitation (or endowment)
xhat(k,1)=mean(Y); %Get
the mean of the contrastive learning procedure
end
We plot the exploration and exploitation convergence plots for 100, 1000 and 10,000
iterations:
100 iterations:
1,000 iterations:
10,000 iterations:
As seen and discussed, there is a fine balance between
the right amount of exploration and the right degree of exploitation. Despite
its importance, there has never been any known practical guideline, apart from generic tuning, for this
balance as regards to learning behaviour.
It does not seem to be practical, in any sense, to find a guideline based on the balances between exploration and exploitation
by simply reading convergence plots alone, which are simply data, which is at least
partially recalcitrant as the overall behaviour of the system that follows the algorithm has not itself
been pre-programmed but nor is the system based entirely on probability.
Therefore, it makes more sense to think about the behaviour and data we empirically observe using an abstract model of a physical
system that reduces the behaviour, otherwise taken for granted, to a simple signalling
action principle based on the environmental parameters of the system. We will be exploring the consequences of this further and search for different ways of understanding metaheuristics in the
realm of information theory, in particular in the behaviour of systems which do not occur from prior programming.
Next Part: Integrating the Concept of Meta-heuristics to Neural Networks