Skip to content

Cmaes

This namespace contains the CMAES class

CMAES (SearchAlgorithm, SinglePopulationAlgorithmMixin)

This is a GPU-accelerated and vectorized implementation, based on pycma (version r3.2.2) and the below references.

References:

Nikolaus Hansen, Youhei Akimoto, and Petr Baudis.
CMA-ES/pycma on Github. Zenodo, DOI:10.5281/zenodo.2559634,
February 2019.
<https://github.com/CMA-ES/pycma>

Nikolaus Hansen, Andreas Ostermeier (2001).
Completely Derandomized Self-Adaptation in Evolution Strategies.

Nikolaus Hansen (2016).
The CMA Evolution Strategy: A Tutorial.
Source code in evotorch/algorithms/cmaes.py
class CMAES(SearchAlgorithm, SinglePopulationAlgorithmMixin):
    """
    CMAES: Covariance Matrix Adaptation Evolution Strategy.

    This is a GPU-accelerated and vectorized implementation, based on pycma (version r3.2.2)
    and the below references.

    References:

        Nikolaus Hansen, Youhei Akimoto, and Petr Baudis.
        CMA-ES/pycma on Github. Zenodo, DOI:10.5281/zenodo.2559634,
        February 2019.
        <https://github.com/CMA-ES/pycma>

        Nikolaus Hansen, Andreas Ostermeier (2001).
        Completely Derandomized Self-Adaptation in Evolution Strategies.

        Nikolaus Hansen (2016).
        The CMA Evolution Strategy: A Tutorial.

    """

    def __init__(
        self,
        problem: Problem,
        *,
        stdev_init: Real,
        popsize: Optional[int] = None,
        center_init: Optional[Vector] = None,
        c_m: Real = 1.0,
        c_sigma: Optional[Real] = None,
        c_sigma_ratio: Real = 1.0,
        damp_sigma: Optional[Real] = None,
        damp_sigma_ratio: Real = 1.0,
        c_c: Optional[Real] = None,
        c_c_ratio: Real = 1.0,
        c_1: Optional[Real] = None,
        c_1_ratio: Real = 1.0,
        c_mu: Optional[Real] = None,
        c_mu_ratio: Real = 1.0,
        active: bool = True,
        csa_squared: bool = False,
        stdev_min: Optional[Real] = None,
        stdev_max: Optional[Real] = None,
        separable: bool = False,
        limit_C_decomposition: bool = True,
        obj_index: Optional[int] = None,
    ):
        """
        `__init__(...)`: Initialize the CMAES solver.

        Args:
            problem (Problem): The problem object which is being worked on.
            stdev_init (Real): Initial step-size
            popsize: Population size. Can be specified as an int,
                or can be left as None in which case the CMA-ES rule of thumb is applied:
                popsize = 4 + floor(3 log d) where d is the dimension
            center_init: Initial center point of the search distribution.
                Can be given as a Solution or as a 1-D array.
                If left as None, an initial center point is generated
                with the help of the problem object's `generate_values(...)`
                method.
            c_m (Real): Learning rate for updating the mean
                of the search distribution. By default the value is 1.

            c_sigma (Optional[Real]): Learning rate for updating the step size. If None,
                then the CMA-ES rules of thumb will be applied.
            c_sigma_ratio (Real): Multiplier on the learning rate for the step size.
                if c_sigma has been left as None, can be used to rescale the default c_sigma value.

            damp_sigma (Optional[Real]): Damping factor for updating the step size. If None,
                then the CMA-ES rules of thumb will be applied.
            damp_sigma_ratio (Real): Multiplier on the damping factor for the step size.
                if damp_sigma has been left as None, can be used to rescale the default damp_sigma value.

            c_c (Optional[Real]): Learning rate for updating the rank-1 evolution path.
                If None, then the CMA-ES rules of thumb will be applied.
            c_c_ratio (Real): Multiplier on the learning rate for the rank-1 evolution path.
                if c_c has been left as None, can be used to rescale the default c_c value.

            c_1 (Optional[Real]): Learning rate for the rank-1 update to the covariance matrix.
                If None, then the CMA-ES rules of thumb will be applied.
            c_1_ratio (Real): Multiplier on the learning rate for the rank-1 update to the covariance matrix.
                if c_1 has been left as None, can be used to rescale the default c_1 value.

            c_mu (Optional[Real]): Learning rate for the rank-mu update to the covariance matrix.
                If None, then the CMA-ES rules of thumb will be applied.
            c_mu_ratio (Real): Multiplier on the learning rate for the rank-mu update to the covariance matrix.
                if c_mu has been left as None, can be used to rescale the default c_mu value.

            active (bool): Whether to use Active CMA-ES. Defaults to True, consistent with the tutorial paper and pycma.
            csa_squared (bool): Whether to use the squared rule ("CSA_squared" in pycma) for the step-size adapation.
                This effectively corresponds to taking the natural gradient for the evolution path on the step size,
                rather than the default CMA-ES rule of thumb.

            stdev_min (Optional[Real]): Minimum allowed standard deviation of the search
                distribution. Leaving this as None means that no such
                boundary is to be used.
                Can be given as None or as a scalar.
            stdev_max (Optional[Real]): Maximum allowed standard deviation of the search
                distribution. Leaving this as None means that no such
                boundary is to be used.
                Can be given as None or as a scalar.

            separable (bool): Provide this as True if you would like the problem
                to be treated as a separable one. Treating a problem
                as separable means to adapt only the diagonal parts
                of the covariance matrix and to keep the non-diagonal
                parts 0. High dimensional problems result in large
                covariance matrices on which operating is computationally
                expensive. Therefore, for such high dimensional problems,
                setting `separable` as True might be useful.

            limit_C_decomposition (bool): Whether to limit the frequency of decomposition of the shape matrix C
                Setting this to True (default) means that C will not be decomposed every generation
                This degrades the quality of the sampling and updates, but provides a guarantee of O(d^2) time complexity.
                This option can be used with separable=True (e.g. for experimental reasons) but the performance will only degrade
                without time-complexity benefits.


            obj_index (Optional[int]): Objective index according to which evaluation
                of the solution will be done.
        """

        # Initialize the base class
        SearchAlgorithm.__init__(self, problem, center=self._get_center, stepsize=self._get_sigma)

        # Ensure that the problem is numeric
        problem.ensure_numeric()

        # Store the objective index
        self._obj_index = problem.normalize_obj_index(obj_index)

        # Track d = solution length for reference in initialization of hyperparameters
        d = self._problem.solution_length

        # === Initialize population ===
        if not popsize:
            # Default value used in CMA-ES literature 4 + floor(3 log n)
            popsize = 4 + int(np.floor(3 * np.log(d)))
        self.popsize = int(popsize)
        # Half popsize, referred to as mu in CMA-ES literature
        self.mu = int(np.floor(popsize / 2))
        self._population = problem.generate_batch(popsize=popsize)

        # === Initialize search distribution ===

        self.separable = separable

        # If `center_init` is not given, generate an initial solution
        # with the help of the problem object.
        # If it is given as a Solution, then clone the solution's values
        # as a PyTorch tensor.
        # Otherwise, use the given initial solution as the starting
        # point in the search space.
        if center_init is None:
            center_init = self._problem.generate_values(1)
        elif isinstance(center_init, Solution):
            center_init = center_init.values.clone()

        # Store the center
        self.m = self._problem.make_tensor(center_init)

        # Store the initial step size
        self.sigma = self._problem.make_tensor(stdev_init)

        if separable:
            # Initialize C as the diagonal vector. Note that when separable, the eigendecomposition is not needed
            self.C = self._problem.make_ones(d)
            # In this case A is simply the square root of elements of C
            self.A = self._problem.make_ones(d)
        else:
            # Initialize C = AA^T all diagonal.
            self.C = self._problem.make_I(d)
            self.A = self.C.clone()

        # === Initialize raw weights ===
        # Conditioned on popsize

        # w_i = log((lambda + 1) / 2) - log(i) for i = 1 ... lambda
        raw_weights = self.problem.make_tensor(np.log((popsize + 1) / 2) - torch.log(torch.arange(popsize) + 1))
        # positive valued weights are the first mu
        positive_weights = raw_weights[: self.mu]
        negative_weights = raw_weights[self.mu :]

        # Variance effective selection mass of positive weights
        # Not affected by future updates to raw_weights
        self.mu_eff = torch.sum(positive_weights).pow(2.0) / torch.sum(positive_weights.pow(2.0))

        # === Initialize search parameters ===
        # Conditioned on weights

        # Store fixed information
        self.c_m = c_m
        self.active = active
        self.csa_squared = csa_squared
        self.stdev_min = stdev_min
        self.stdev_max = stdev_max

        # Learning rate for step-size adaption
        if c_sigma is None:
            c_sigma = (self.mu_eff + 2.0) / (d + self.mu_eff + 3)
        self.c_sigma = c_sigma_ratio * c_sigma

        # Damping factor for step-size adapation
        if damp_sigma is None:
            damp_sigma = 1 + 2 * max(0, torch.sqrt((self.mu_eff - 1) / (d + 1)) - 1) + self.c_sigma
        self.damp_sigma = damp_sigma_ratio * damp_sigma

        # Learning rate for evolution path for rank-1 update
        if c_c is None:
            # Branches on separability
            if separable:
                c_c = (1 + (1 / d) + (self.mu_eff / d)) / (d**0.5 + (1 / d) + 2 * (self.mu_eff / d))
            else:
                c_c = (4 + self.mu_eff / d) / (d + (4 + 2 * self.mu_eff / d))
        self.c_c = c_c_ratio * c_c

        # Learning rate for rank-1 update to covariance matrix
        if c_1 is None:
            # Branches on separability
            if separable:
                c_1 = 1.0 / (d + 2.0 * np.sqrt(d) + self.mu_eff / d)
            else:
                c_1 = min(1, popsize / 6) * 2 / ((d + 1.3) ** 2.0 + self.mu_eff)
        self.c_1 = c_1_ratio * c_1

        # Learning rate for rank-mu update to covariance matrix
        if c_mu is None:
            # Branches on separability
            if separable:
                c_mu = (0.25 + self.mu_eff + (1.0 / self.mu_eff) - 2) / (d + 4 * np.sqrt(d) + (self.mu_eff / 2.0))
            else:
                c_mu = min(
                    1 - self.c_1, 2 * ((0.25 + self.mu_eff - 2 + (1 / self.mu_eff)) / ((d + 2) ** 2.0 + self.mu_eff))
                )
        self.c_mu = c_mu_ratio * c_mu

        # The 'variance aware' coefficient used for the additive component of the evolution path for sigma
        self.variance_discount_sigma = torch.sqrt(self.c_sigma * (2 - self.c_sigma) * self.mu_eff)
        # The 'variance aware' coefficient used for the additive component of the evolution path for rank-1 updates
        self.variance_discount_c = torch.sqrt(self.c_c * (2 - self.c_c) * self.mu_eff)

        # === Finalize weights ===
        # Conditioned on search parameters and raw weights

        # Positive weights always sum to 1
        positive_weights = positive_weights / torch.sum(positive_weights)

        if self.active:
            # Active CMA-ES: negative weights sum to alpha

            # Get the variance effective selection mass of negative weights
            mu_eff_neg = torch.sum(negative_weights).pow(2.0) / torch.sum(negative_weights.pow(2.0))

            # Alpha is the minimum of the following 3 terms
            alpha_mu = 1 + self.c_1 / self.c_mu
            alpha_mu_eff = 1 + 2 * mu_eff_neg / (self.mu_eff + 2)
            alpha_pos_def = (1 - self.c_mu - self.c_1) / (d * self.c_mu)
            alpha = min([alpha_mu, alpha_mu_eff, alpha_pos_def])

            # Rescale negative weights
            negative_weights = alpha * negative_weights / torch.sum(torch.abs(negative_weights))
        else:
            # Negative weights are simply zero
            negative_weights = torch.zeros_like(negative_weights)

        # Concatenate final weights
        self.weights = torch.cat([positive_weights, negative_weights], dim=-1)

        # === Some final setup ===

        # Initialize the evolution paths
        self.p_sigma = 0.0
        self.p_c = 0.0

        # Hansen's approximation to the expectation of ||x|| x ~ N(0, I_d).
        # Note that we could use the exact formulation with Gamma functions, but we'll retain this form for consistency
        self.unbiased_expectation = np.sqrt(d) * (1 - (1 / (4 * d)) + 1 / (21 * d**2))

        self.last_ex = None

        # How often to decompose C
        if limit_C_decomposition:
            self.decompose_C_freq = max(1, int(np.floor(_safe_divide(1, 10 * d * (self.c_1.cpu() + self.c_mu.cpu())))))
        else:
            self.decompose_C_freq = 1

        # Use the SinglePopulationAlgorithmMixin to enable additional status reports regarding the population.
        SinglePopulationAlgorithmMixin.__init__(self)

    @property
    def population(self) -> SolutionBatch:
        """Population generated by the CMA-ES algorithm"""
        return self._population

    def _get_center(self) -> torch.Tensor:
        """Get the center of search distribution, m"""
        return self.m

    def _get_sigma(self) -> float:
        """Get the step-size of the search distribution, sigma"""
        return float(self.sigma.cpu())

    @property
    def obj_index(self) -> int:
        """Index of the objective being focused on"""
        return self._obj_index

    def sample_distribution(self, num_samples: Optional[int] = None) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
        """Sample the population. All 3 representations of solutions are returned for easy calculations of updates.
        Note that the computation time of this operation of O(d^2 num_samples) unless separable, in which case O(d num_samples)
        Args:
            num_samples (Optional[int]): The number of samples to draw. If None, then the population size is used
        Returns:
            zs (torch.Tensor): A tensor of shape [num_samples, d] of samples from the local coordinate space e.g. z_i ~ N(0, I_d)
            ys (torch.Tensor): A tensor of shape [num_samples, d] of samples from the shaped coordinate space e.g. y_i ~ N(0, C)
            xs (torch.Tensor): A tensor of shape [num_samples, d] of samples from the search space e.g. x_i ~ N(m, sigma^2 C)
        """
        if num_samples is None:
            num_samples = self.popsize
        # Generate z values
        zs = self._problem.make_gaussian(num_solutions=num_samples)
        # Construct ys = A zs
        if self.separable:
            # In the separable case A is diagonal so is represented as a single vector
            ys = self.A.unsqueeze(0) * zs
        else:
            ys = (self.A @ zs.T).T
        # Construct xs = m + sigma ys
        xs = self.m.unsqueeze(0) + self.sigma * ys
        return zs, ys, xs

    def get_population_weights(self, xs: torch.Tensor) -> torch.Tensor:
        """Get the assigned weights of the population (e.g. evaluate, rank and return)
        Args:
            xs (torch.Tensor): The population samples drawn from N(mu, sigma^2 C)
        Returns:
            assigned_weights (torch.Tensor): A [popsize, ] dimensional tensor of ordered weights
        """
        # Computation is O(popsize * F_time) where F_time is the evalutation time per sample
        # Fill the population
        self._population.set_values(xs)
        # Evaluate the current population
        self.problem.evaluate(self._population)
        # Sort the population
        indices = self._population.argsort(obj_index=self.obj_index)
        # Invert the sorting of the population to obtain the ranks
        # Note that these ranks start at zero, but this is fine as we are just using them for indexing
        ranks = torch.zeros_like(indices)
        ranks[indices] = torch.arange(self.popsize, dtype=indices.dtype, device=indices.device)
        # Get weights corresponding to each rank
        assigned_weights = self.weights[ranks]
        return assigned_weights

    def update_m(self, zs: torch.Tensor, ys: torch.Tensor, assigned_weights: torch.Tensor) -> torch.Tensor:
        """Update the center of the search distribution m
        With zs and ys retained from sampling, this operation is O(popsize d), as it involves summing across popsize d-dimensional vectors.
        Args:
            zs (torch.Tensor): A tensor of shape [popsize, d] of samples from the local coordinate space e.g. z_i ~ N(0, I_d)
            ys (torch.Tensor): A tensor of shape [popsize, d] of samples from the shaped coordinate space e.g. y_i ~ N(0, C)
            assigned_weights (torch.Tensor): A [popsize, ] dimensional tensor of ordered weights
        Returns:
            local_m_displacement (torch.Tensor): A tensor of shape [d], corresponding to the local transformation of m,
                (1/sigma) (C^-1/2) (m' - m) where m' is the updated m
            shaped_m_displacement (torch.Tensor): A tensor of shape [d], corresponding to the shaped transformation of m,
                (1/sigma) (m' - m) where m' is the updated m
        """
        # Get the top-mu weights
        top_mu = torch.topk(assigned_weights, k=self.mu)
        top_mu_weights = top_mu.values
        top_mu_indices = top_mu.indices

        # Compute the weighted recombination in local coordinate space
        local_m_displacement = torch.sum(top_mu_weights.unsqueeze(-1) * zs[top_mu_indices], dim=0)
        # Compute the weighted recombination in shaped coordinate space
        shaped_m_displacement = torch.sum(top_mu_weights.unsqueeze(-1) * ys[top_mu_indices], dim=0)

        # Update m
        self.m = self.m + self.c_m * self.sigma * shaped_m_displacement

        # Return the weighted recombinations
        return local_m_displacement, shaped_m_displacement

    def update_p_sigma(self, local_m_displacement: torch.Tensor) -> None:
        """Update the evolution path for sigma, p_sigma
        This operation is bounded O(d), as is simply the sum of vectors
        Args:
            local_m_displacement (torch.Tensor): The weighted recombination of local samples zs, corresponding to
                (1/sigma) (C^-1/2) (m' - m) where m' is the updated m
        """
        self.p_sigma = (1 - self.c_sigma) * self.p_sigma + self.variance_discount_sigma * local_m_displacement

    def update_sigma(self) -> None:
        """Update the step size sigma according to its evolution path p_sigma
        This operation is bounded O(d), with the most expensive component being the norm of the evolution path, a d-dimensional vector.
        """
        d = self._problem.solution_length
        # Compute the exponential update
        if self.csa_squared:
            # Exponential update based on natural gradient maximizing squared norm of p_sigma
            exponential_update = (torch.norm(self.p_sigma).pow(2.0) / d - 1) / 2
        else:
            # Exponential update increasing likelihood p_sigma having expected norm
            exponential_update = torch.norm(self.p_sigma) / self.unbiased_expectation - 1
        # Rescale exponential update based on learning rate + damping factor
        exponential_update = (self.c_sigma / self.damp_sigma) * exponential_update
        # Multiplicative update to sigma
        self.sigma = self.sigma * torch.exp(exponential_update)

    def update_p_c(self, shaped_m_displacement: torch.Tensor, h_sig: torch.Tensor) -> None:
        """Update the evolution path for rank-1 update, p_c
        This operation is bounded O(d), as is simply the sum of vectors
        Args:
            local_m_displacement (torch.Tensor): The weighted recombination of shaped samples ys, corresponding to
                (1/sigma) (m' - m) where m' is the updated m
            h_sig (torch.Tensor): Whether to stall the update based on the evolution path on sigma, p_sigma, expressed as a torch float
        """
        self.p_c = (1 - self.c_c) * self.p_c + h_sig * self.variance_discount_c * shaped_m_displacement

    def update_C(self, zs: torch.Tensor, ys: torch.Tensor, assigned_weights: torch.Tensor, h_sig: torch.Tensor) -> None:
        """Update the covariance shape matrix C based on rank-1 and rank-mu updates
        This operation is bounded O(d^2 popsize), which is associated with computing the rank-mu update (summing across popsize d*d matrices)
        Args:
            zs (torch.Tensor): A tensor of shape [popsize, d] of samples from the local coordinate space e.g. z_i ~ N(0, I_d)
            ys (torch.Tensor): A tensor of shape [popsize, d] of samples from the shaped coordinate space e.g. y_i ~ N(0, C)
            assigned_weights (torch.Tensor): A [popsize, ] dimensional tensor of ordered weights
            h_sig (torch.Tensor): Whether to stall the update based on the evolution path on sigma, p_sigma, expressed as a torch float
        """
        d = self._problem.solution_length
        # If using Active CMA-ES, reweight negative weights
        if self.active:
            assigned_weights = torch.where(
                assigned_weights > 0, assigned_weights, d * assigned_weights / torch.norm(zs, dim=-1).pow(2.0)
            )
        c1a = self.c_1 * (1 - (1 - h_sig**2) * self.c_c * (2 - self.c_c))  # adjust for variance loss
        weighted_pc = (self.c_1 / (c1a + 1e-23)) ** 0.5
        if self.separable:
            # Rank-1 update
            r1_update = c1a * (self.p_c.pow(2.0) - self.C)
            # Rank-mu update
            rmu_update = self.c_mu * torch.sum(
                assigned_weights.unsqueeze(-1) * (ys.pow(2.0) - self.C.unsqueeze(0)), dim=0
            )
        else:
            # Rank-1 update
            r1_update = c1a * (torch.outer(weighted_pc * self.p_c, weighted_pc * self.p_c) - self.C)
            # Rank-mu update
            rmu_update = self.c_mu * (
                torch.sum(assigned_weights.unsqueeze(-1).unsqueeze(-1) * (ys.unsqueeze(1) * ys.unsqueeze(2)), dim=0)
                - torch.sum(self.weights) * self.C
            )

        # Update C
        self.C = self.C + r1_update + rmu_update

    def decompose_C(self) -> None:
        """Perform the decomposition C = AA^T using a cholesky decomposition
        Note that traditionally CMA-ES uses the eigendecomposition C = BDDB^-1. In our case,
        we keep track of zs, ys and xs when sampling, so we never need C^-1/2.
        Therefore, a cholesky decomposition is all that is necessary. This generally requires
        O(d^3/3) operations, rather than the more costly O(d^3) operations associated with the eigendecomposition.
        """
        if self.separable:
            self.A = self.C.pow(0.5)
        else:
            self.A = torch.linalg.cholesky(self.C)

    def _step(self):
        """Perform a step of the CMA-ES solver"""

        # === Sampling, evaluation and ranking ===

        # Sample the search distribution
        zs, ys, xs = self.sample_distribution()
        # Get the weights assigned to each solution
        assigned_weights = self.get_population_weights(xs)

        # === Center adaption ===

        local_m_displacement, shaped_m_displacement = self.update_m(zs, ys, assigned_weights)

        # === Step size adaption ===

        # Update evolution path p_sigma
        self.update_p_sigma(local_m_displacement)
        # Update sigma
        self.update_sigma()

        # Compute h_sig, a boolean flag for stalling the update to p_c
        h_sig = _h_sig(self.p_sigma, self.c_sigma, self._steps_count)

        # === Unscaled covariance adapation ===

        # Update evolution path p_c
        self.update_p_c(shaped_m_displacement, h_sig)
        # Update the covariance shape C
        self.update_C(zs, ys, assigned_weights, h_sig)

        # === Post-step corrections ===

        # Limit element-wise standard deviation of sigma^2 C
        if self.stdev_min is not None or self.stdev_max is not None:
            self.C = _limit_stdev(self.sigma, self.C, self.stdev_min, self.stdev_max)

        # Decompose C
        if (self._steps_count + 1) % self.decompose_C_freq == 0:
            self.decompose_C()

obj_index: int property readonly

Index of the objective being focused on

population: SolutionBatch property readonly

Population generated by the CMA-ES algorithm

__init__(self, problem, *, stdev_init, popsize=None, center_init=None, c_m=1.0, c_sigma=None, c_sigma_ratio=1.0, damp_sigma=None, damp_sigma_ratio=1.0, c_c=None, c_c_ratio=1.0, c_1=None, c_1_ratio=1.0, c_mu=None, c_mu_ratio=1.0, active=True, csa_squared=False, stdev_min=None, stdev_max=None, separable=False, limit_C_decomposition=True, obj_index=None) special

__init__(...): Initialize the CMAES solver.

Parameters:

Name Type Description Default
problem Problem

The problem object which is being worked on.

required
stdev_init Real

Initial step-size

required
popsize Optional[int]

Population size. Can be specified as an int, or can be left as None in which case the CMA-ES rule of thumb is applied: popsize = 4 + floor(3 log d) where d is the dimension

None
center_init Union[Iterable[float], torch.Tensor]

Initial center point of the search distribution. Can be given as a Solution or as a 1-D array. If left as None, an initial center point is generated with the help of the problem object's generate_values(...) method.

None
c_m Real

Learning rate for updating the mean of the search distribution. By default the value is 1.

1.0
c_sigma Optional[Real]

Learning rate for updating the step size. If None, then the CMA-ES rules of thumb will be applied.

None
c_sigma_ratio Real

Multiplier on the learning rate for the step size. if c_sigma has been left as None, can be used to rescale the default c_sigma value.

1.0
damp_sigma Optional[Real]

Damping factor for updating the step size. If None, then the CMA-ES rules of thumb will be applied.

None
damp_sigma_ratio Real

Multiplier on the damping factor for the step size. if damp_sigma has been left as None, can be used to rescale the default damp_sigma value.

1.0
c_c Optional[Real]

Learning rate for updating the rank-1 evolution path. If None, then the CMA-ES rules of thumb will be applied.

None
c_c_ratio Real

Multiplier on the learning rate for the rank-1 evolution path. if c_c has been left as None, can be used to rescale the default c_c value.

1.0
c_1 Optional[Real]

Learning rate for the rank-1 update to the covariance matrix. If None, then the CMA-ES rules of thumb will be applied.

None
c_1_ratio Real

Multiplier on the learning rate for the rank-1 update to the covariance matrix. if c_1 has been left as None, can be used to rescale the default c_1 value.

1.0
c_mu Optional[Real]

Learning rate for the rank-mu update to the covariance matrix. If None, then the CMA-ES rules of thumb will be applied.

None
c_mu_ratio Real

Multiplier on the learning rate for the rank-mu update to the covariance matrix. if c_mu has been left as None, can be used to rescale the default c_mu value.

1.0
active bool

Whether to use Active CMA-ES. Defaults to True, consistent with the tutorial paper and pycma.

True
csa_squared bool

Whether to use the squared rule ("CSA_squared" in pycma) for the step-size adapation. This effectively corresponds to taking the natural gradient for the evolution path on the step size, rather than the default CMA-ES rule of thumb.

False
stdev_min Optional[Real]

Minimum allowed standard deviation of the search distribution. Leaving this as None means that no such boundary is to be used. Can be given as None or as a scalar.

None
stdev_max Optional[Real]

Maximum allowed standard deviation of the search distribution. Leaving this as None means that no such boundary is to be used. Can be given as None or as a scalar.

None
separable bool

Provide this as True if you would like the problem to be treated as a separable one. Treating a problem as separable means to adapt only the diagonal parts of the covariance matrix and to keep the non-diagonal parts 0. High dimensional problems result in large covariance matrices on which operating is computationally expensive. Therefore, for such high dimensional problems, setting separable as True might be useful.

False
limit_C_decomposition bool

Whether to limit the frequency of decomposition of the shape matrix C Setting this to True (default) means that C will not be decomposed every generation This degrades the quality of the sampling and updates, but provides a guarantee of O(d^2) time complexity. This option can be used with separable=True (e.g. for experimental reasons) but the performance will only degrade without time-complexity benefits.

True
obj_index Optional[int]

Objective index according to which evaluation of the solution will be done.

None
Source code in evotorch/algorithms/cmaes.py
def __init__(
    self,
    problem: Problem,
    *,
    stdev_init: Real,
    popsize: Optional[int] = None,
    center_init: Optional[Vector] = None,
    c_m: Real = 1.0,
    c_sigma: Optional[Real] = None,
    c_sigma_ratio: Real = 1.0,
    damp_sigma: Optional[Real] = None,
    damp_sigma_ratio: Real = 1.0,
    c_c: Optional[Real] = None,
    c_c_ratio: Real = 1.0,
    c_1: Optional[Real] = None,
    c_1_ratio: Real = 1.0,
    c_mu: Optional[Real] = None,
    c_mu_ratio: Real = 1.0,
    active: bool = True,
    csa_squared: bool = False,
    stdev_min: Optional[Real] = None,
    stdev_max: Optional[Real] = None,
    separable: bool = False,
    limit_C_decomposition: bool = True,
    obj_index: Optional[int] = None,
):
    """
    `__init__(...)`: Initialize the CMAES solver.

    Args:
        problem (Problem): The problem object which is being worked on.
        stdev_init (Real): Initial step-size
        popsize: Population size. Can be specified as an int,
            or can be left as None in which case the CMA-ES rule of thumb is applied:
            popsize = 4 + floor(3 log d) where d is the dimension
        center_init: Initial center point of the search distribution.
            Can be given as a Solution or as a 1-D array.
            If left as None, an initial center point is generated
            with the help of the problem object's `generate_values(...)`
            method.
        c_m (Real): Learning rate for updating the mean
            of the search distribution. By default the value is 1.

        c_sigma (Optional[Real]): Learning rate for updating the step size. If None,
            then the CMA-ES rules of thumb will be applied.
        c_sigma_ratio (Real): Multiplier on the learning rate for the step size.
            if c_sigma has been left as None, can be used to rescale the default c_sigma value.

        damp_sigma (Optional[Real]): Damping factor for updating the step size. If None,
            then the CMA-ES rules of thumb will be applied.
        damp_sigma_ratio (Real): Multiplier on the damping factor for the step size.
            if damp_sigma has been left as None, can be used to rescale the default damp_sigma value.

        c_c (Optional[Real]): Learning rate for updating the rank-1 evolution path.
            If None, then the CMA-ES rules of thumb will be applied.
        c_c_ratio (Real): Multiplier on the learning rate for the rank-1 evolution path.
            if c_c has been left as None, can be used to rescale the default c_c value.

        c_1 (Optional[Real]): Learning rate for the rank-1 update to the covariance matrix.
            If None, then the CMA-ES rules of thumb will be applied.
        c_1_ratio (Real): Multiplier on the learning rate for the rank-1 update to the covariance matrix.
            if c_1 has been left as None, can be used to rescale the default c_1 value.

        c_mu (Optional[Real]): Learning rate for the rank-mu update to the covariance matrix.
            If None, then the CMA-ES rules of thumb will be applied.
        c_mu_ratio (Real): Multiplier on the learning rate for the rank-mu update to the covariance matrix.
            if c_mu has been left as None, can be used to rescale the default c_mu value.

        active (bool): Whether to use Active CMA-ES. Defaults to True, consistent with the tutorial paper and pycma.
        csa_squared (bool): Whether to use the squared rule ("CSA_squared" in pycma) for the step-size adapation.
            This effectively corresponds to taking the natural gradient for the evolution path on the step size,
            rather than the default CMA-ES rule of thumb.

        stdev_min (Optional[Real]): Minimum allowed standard deviation of the search
            distribution. Leaving this as None means that no such
            boundary is to be used.
            Can be given as None or as a scalar.
        stdev_max (Optional[Real]): Maximum allowed standard deviation of the search
            distribution. Leaving this as None means that no such
            boundary is to be used.
            Can be given as None or as a scalar.

        separable (bool): Provide this as True if you would like the problem
            to be treated as a separable one. Treating a problem
            as separable means to adapt only the diagonal parts
            of the covariance matrix and to keep the non-diagonal
            parts 0. High dimensional problems result in large
            covariance matrices on which operating is computationally
            expensive. Therefore, for such high dimensional problems,
            setting `separable` as True might be useful.

        limit_C_decomposition (bool): Whether to limit the frequency of decomposition of the shape matrix C
            Setting this to True (default) means that C will not be decomposed every generation
            This degrades the quality of the sampling and updates, but provides a guarantee of O(d^2) time complexity.
            This option can be used with separable=True (e.g. for experimental reasons) but the performance will only degrade
            without time-complexity benefits.


        obj_index (Optional[int]): Objective index according to which evaluation
            of the solution will be done.
    """

    # Initialize the base class
    SearchAlgorithm.__init__(self, problem, center=self._get_center, stepsize=self._get_sigma)

    # Ensure that the problem is numeric
    problem.ensure_numeric()

    # Store the objective index
    self._obj_index = problem.normalize_obj_index(obj_index)

    # Track d = solution length for reference in initialization of hyperparameters
    d = self._problem.solution_length

    # === Initialize population ===
    if not popsize:
        # Default value used in CMA-ES literature 4 + floor(3 log n)
        popsize = 4 + int(np.floor(3 * np.log(d)))
    self.popsize = int(popsize)
    # Half popsize, referred to as mu in CMA-ES literature
    self.mu = int(np.floor(popsize / 2))
    self._population = problem.generate_batch(popsize=popsize)

    # === Initialize search distribution ===

    self.separable = separable

    # If `center_init` is not given, generate an initial solution
    # with the help of the problem object.
    # If it is given as a Solution, then clone the solution's values
    # as a PyTorch tensor.
    # Otherwise, use the given initial solution as the starting
    # point in the search space.
    if center_init is None:
        center_init = self._problem.generate_values(1)
    elif isinstance(center_init, Solution):
        center_init = center_init.values.clone()

    # Store the center
    self.m = self._problem.make_tensor(center_init)

    # Store the initial step size
    self.sigma = self._problem.make_tensor(stdev_init)

    if separable:
        # Initialize C as the diagonal vector. Note that when separable, the eigendecomposition is not needed
        self.C = self._problem.make_ones(d)
        # In this case A is simply the square root of elements of C
        self.A = self._problem.make_ones(d)
    else:
        # Initialize C = AA^T all diagonal.
        self.C = self._problem.make_I(d)
        self.A = self.C.clone()

    # === Initialize raw weights ===
    # Conditioned on popsize

    # w_i = log((lambda + 1) / 2) - log(i) for i = 1 ... lambda
    raw_weights = self.problem.make_tensor(np.log((popsize + 1) / 2) - torch.log(torch.arange(popsize) + 1))
    # positive valued weights are the first mu
    positive_weights = raw_weights[: self.mu]
    negative_weights = raw_weights[self.mu :]

    # Variance effective selection mass of positive weights
    # Not affected by future updates to raw_weights
    self.mu_eff = torch.sum(positive_weights).pow(2.0) / torch.sum(positive_weights.pow(2.0))

    # === Initialize search parameters ===
    # Conditioned on weights

    # Store fixed information
    self.c_m = c_m
    self.active = active
    self.csa_squared = csa_squared
    self.stdev_min = stdev_min
    self.stdev_max = stdev_max

    # Learning rate for step-size adaption
    if c_sigma is None:
        c_sigma = (self.mu_eff + 2.0) / (d + self.mu_eff + 3)
    self.c_sigma = c_sigma_ratio * c_sigma

    # Damping factor for step-size adapation
    if damp_sigma is None:
        damp_sigma = 1 + 2 * max(0, torch.sqrt((self.mu_eff - 1) / (d + 1)) - 1) + self.c_sigma
    self.damp_sigma = damp_sigma_ratio * damp_sigma

    # Learning rate for evolution path for rank-1 update
    if c_c is None:
        # Branches on separability
        if separable:
            c_c = (1 + (1 / d) + (self.mu_eff / d)) / (d**0.5 + (1 / d) + 2 * (self.mu_eff / d))
        else:
            c_c = (4 + self.mu_eff / d) / (d + (4 + 2 * self.mu_eff / d))
    self.c_c = c_c_ratio * c_c

    # Learning rate for rank-1 update to covariance matrix
    if c_1 is None:
        # Branches on separability
        if separable:
            c_1 = 1.0 / (d + 2.0 * np.sqrt(d) + self.mu_eff / d)
        else:
            c_1 = min(1, popsize / 6) * 2 / ((d + 1.3) ** 2.0 + self.mu_eff)
    self.c_1 = c_1_ratio * c_1

    # Learning rate for rank-mu update to covariance matrix
    if c_mu is None:
        # Branches on separability
        if separable:
            c_mu = (0.25 + self.mu_eff + (1.0 / self.mu_eff) - 2) / (d + 4 * np.sqrt(d) + (self.mu_eff / 2.0))
        else:
            c_mu = min(
                1 - self.c_1, 2 * ((0.25 + self.mu_eff - 2 + (1 / self.mu_eff)) / ((d + 2) ** 2.0 + self.mu_eff))
            )
    self.c_mu = c_mu_ratio * c_mu

    # The 'variance aware' coefficient used for the additive component of the evolution path for sigma
    self.variance_discount_sigma = torch.sqrt(self.c_sigma * (2 - self.c_sigma) * self.mu_eff)
    # The 'variance aware' coefficient used for the additive component of the evolution path for rank-1 updates
    self.variance_discount_c = torch.sqrt(self.c_c * (2 - self.c_c) * self.mu_eff)

    # === Finalize weights ===
    # Conditioned on search parameters and raw weights

    # Positive weights always sum to 1
    positive_weights = positive_weights / torch.sum(positive_weights)

    if self.active:
        # Active CMA-ES: negative weights sum to alpha

        # Get the variance effective selection mass of negative weights
        mu_eff_neg = torch.sum(negative_weights).pow(2.0) / torch.sum(negative_weights.pow(2.0))

        # Alpha is the minimum of the following 3 terms
        alpha_mu = 1 + self.c_1 / self.c_mu
        alpha_mu_eff = 1 + 2 * mu_eff_neg / (self.mu_eff + 2)
        alpha_pos_def = (1 - self.c_mu - self.c_1) / (d * self.c_mu)
        alpha = min([alpha_mu, alpha_mu_eff, alpha_pos_def])

        # Rescale negative weights
        negative_weights = alpha * negative_weights / torch.sum(torch.abs(negative_weights))
    else:
        # Negative weights are simply zero
        negative_weights = torch.zeros_like(negative_weights)

    # Concatenate final weights
    self.weights = torch.cat([positive_weights, negative_weights], dim=-1)

    # === Some final setup ===

    # Initialize the evolution paths
    self.p_sigma = 0.0
    self.p_c = 0.0

    # Hansen's approximation to the expectation of ||x|| x ~ N(0, I_d).
    # Note that we could use the exact formulation with Gamma functions, but we'll retain this form for consistency
    self.unbiased_expectation = np.sqrt(d) * (1 - (1 / (4 * d)) + 1 / (21 * d**2))

    self.last_ex = None

    # How often to decompose C
    if limit_C_decomposition:
        self.decompose_C_freq = max(1, int(np.floor(_safe_divide(1, 10 * d * (self.c_1.cpu() + self.c_mu.cpu())))))
    else:
        self.decompose_C_freq = 1

    # Use the SinglePopulationAlgorithmMixin to enable additional status reports regarding the population.
    SinglePopulationAlgorithmMixin.__init__(self)

decompose_C(self)

Perform the decomposition C = AA^T using a cholesky decomposition Note that traditionally CMA-ES uses the eigendecomposition C = BDDB^-1. In our case, we keep track of zs, ys and xs when sampling, so we never need C^-½. Therefore, a cholesky decomposition is all that is necessary. This generally requires O(d^3/3) operations, rather than the more costly O(d^3) operations associated with the eigendecomposition.

Source code in evotorch/algorithms/cmaes.py
def decompose_C(self) -> None:
    """Perform the decomposition C = AA^T using a cholesky decomposition
    Note that traditionally CMA-ES uses the eigendecomposition C = BDDB^-1. In our case,
    we keep track of zs, ys and xs when sampling, so we never need C^-1/2.
    Therefore, a cholesky decomposition is all that is necessary. This generally requires
    O(d^3/3) operations, rather than the more costly O(d^3) operations associated with the eigendecomposition.
    """
    if self.separable:
        self.A = self.C.pow(0.5)
    else:
        self.A = torch.linalg.cholesky(self.C)

get_population_weights(self, xs)

Get the assigned weights of the population (e.g. evaluate, rank and return)

Parameters:

Name Type Description Default
xs torch.Tensor

The population samples drawn from N(mu, sigma^2 C)

required

Returns:

Type Description
assigned_weights (torch.Tensor)

A [popsize, ] dimensional tensor of ordered weights

Source code in evotorch/algorithms/cmaes.py
def get_population_weights(self, xs: torch.Tensor) -> torch.Tensor:
    """Get the assigned weights of the population (e.g. evaluate, rank and return)
    Args:
        xs (torch.Tensor): The population samples drawn from N(mu, sigma^2 C)
    Returns:
        assigned_weights (torch.Tensor): A [popsize, ] dimensional tensor of ordered weights
    """
    # Computation is O(popsize * F_time) where F_time is the evalutation time per sample
    # Fill the population
    self._population.set_values(xs)
    # Evaluate the current population
    self.problem.evaluate(self._population)
    # Sort the population
    indices = self._population.argsort(obj_index=self.obj_index)
    # Invert the sorting of the population to obtain the ranks
    # Note that these ranks start at zero, but this is fine as we are just using them for indexing
    ranks = torch.zeros_like(indices)
    ranks[indices] = torch.arange(self.popsize, dtype=indices.dtype, device=indices.device)
    # Get weights corresponding to each rank
    assigned_weights = self.weights[ranks]
    return assigned_weights

sample_distribution(self, num_samples=None)

Sample the population. All 3 representations of solutions are returned for easy calculations of updates. Note that the computation time of this operation of O(d^2 num_samples) unless separable, in which case O(d num_samples)

Parameters:

Name Type Description Default
num_samples Optional[int]

The number of samples to draw. If None, then the population size is used

None

Returns:

Type Description
zs (torch.Tensor)

A tensor of shape [num_samples, d] of samples from the local coordinate space e.g. z_i ~ N(0, I_d) ys (torch.Tensor): A tensor of shape [num_samples, d] of samples from the shaped coordinate space e.g. y_i ~ N(0, C) xs (torch.Tensor): A tensor of shape [num_samples, d] of samples from the search space e.g. x_i ~ N(m, sigma^2 C)

Source code in evotorch/algorithms/cmaes.py
def sample_distribution(self, num_samples: Optional[int] = None) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
    """Sample the population. All 3 representations of solutions are returned for easy calculations of updates.
    Note that the computation time of this operation of O(d^2 num_samples) unless separable, in which case O(d num_samples)
    Args:
        num_samples (Optional[int]): The number of samples to draw. If None, then the population size is used
    Returns:
        zs (torch.Tensor): A tensor of shape [num_samples, d] of samples from the local coordinate space e.g. z_i ~ N(0, I_d)
        ys (torch.Tensor): A tensor of shape [num_samples, d] of samples from the shaped coordinate space e.g. y_i ~ N(0, C)
        xs (torch.Tensor): A tensor of shape [num_samples, d] of samples from the search space e.g. x_i ~ N(m, sigma^2 C)
    """
    if num_samples is None:
        num_samples = self.popsize
    # Generate z values
    zs = self._problem.make_gaussian(num_solutions=num_samples)
    # Construct ys = A zs
    if self.separable:
        # In the separable case A is diagonal so is represented as a single vector
        ys = self.A.unsqueeze(0) * zs
    else:
        ys = (self.A @ zs.T).T
    # Construct xs = m + sigma ys
    xs = self.m.unsqueeze(0) + self.sigma * ys
    return zs, ys, xs

update_C(self, zs, ys, assigned_weights, h_sig)

Update the covariance shape matrix C based on rank-1 and rank-mu updates This operation is bounded O(d^2 popsize), which is associated with computing the rank-mu update (summing across popsize d*d matrices)

Parameters:

Name Type Description Default
zs torch.Tensor

A tensor of shape [popsize, d] of samples from the local coordinate space e.g. z_i ~ N(0, I_d)

required
ys torch.Tensor

A tensor of shape [popsize, d] of samples from the shaped coordinate space e.g. y_i ~ N(0, C)

required
assigned_weights torch.Tensor

A [popsize, ] dimensional tensor of ordered weights

required
h_sig torch.Tensor

Whether to stall the update based on the evolution path on sigma, p_sigma, expressed as a torch float

required
Source code in evotorch/algorithms/cmaes.py
def update_C(self, zs: torch.Tensor, ys: torch.Tensor, assigned_weights: torch.Tensor, h_sig: torch.Tensor) -> None:
    """Update the covariance shape matrix C based on rank-1 and rank-mu updates
    This operation is bounded O(d^2 popsize), which is associated with computing the rank-mu update (summing across popsize d*d matrices)
    Args:
        zs (torch.Tensor): A tensor of shape [popsize, d] of samples from the local coordinate space e.g. z_i ~ N(0, I_d)
        ys (torch.Tensor): A tensor of shape [popsize, d] of samples from the shaped coordinate space e.g. y_i ~ N(0, C)
        assigned_weights (torch.Tensor): A [popsize, ] dimensional tensor of ordered weights
        h_sig (torch.Tensor): Whether to stall the update based on the evolution path on sigma, p_sigma, expressed as a torch float
    """
    d = self._problem.solution_length
    # If using Active CMA-ES, reweight negative weights
    if self.active:
        assigned_weights = torch.where(
            assigned_weights > 0, assigned_weights, d * assigned_weights / torch.norm(zs, dim=-1).pow(2.0)
        )
    c1a = self.c_1 * (1 - (1 - h_sig**2) * self.c_c * (2 - self.c_c))  # adjust for variance loss
    weighted_pc = (self.c_1 / (c1a + 1e-23)) ** 0.5
    if self.separable:
        # Rank-1 update
        r1_update = c1a * (self.p_c.pow(2.0) - self.C)
        # Rank-mu update
        rmu_update = self.c_mu * torch.sum(
            assigned_weights.unsqueeze(-1) * (ys.pow(2.0) - self.C.unsqueeze(0)), dim=0
        )
    else:
        # Rank-1 update
        r1_update = c1a * (torch.outer(weighted_pc * self.p_c, weighted_pc * self.p_c) - self.C)
        # Rank-mu update
        rmu_update = self.c_mu * (
            torch.sum(assigned_weights.unsqueeze(-1).unsqueeze(-1) * (ys.unsqueeze(1) * ys.unsqueeze(2)), dim=0)
            - torch.sum(self.weights) * self.C
        )

    # Update C
    self.C = self.C + r1_update + rmu_update

update_m(self, zs, ys, assigned_weights)

Update the center of the search distribution m With zs and ys retained from sampling, this operation is O(popsize d), as it involves summing across popsize d-dimensional vectors.

Parameters:

Name Type Description Default
zs torch.Tensor

A tensor of shape [popsize, d] of samples from the local coordinate space e.g. z_i ~ N(0, I_d)

required
ys torch.Tensor

A tensor of shape [popsize, d] of samples from the shaped coordinate space e.g. y_i ~ N(0, C)

required
assigned_weights torch.Tensor

A [popsize, ] dimensional tensor of ordered weights

required

Returns:

Type Description
local_m_displacement (torch.Tensor)

A tensor of shape [d], corresponding to the local transformation of m, (1/sigma) (C^-½) (m' - m) where m' is the updated m shaped_m_displacement (torch.Tensor): A tensor of shape [d], corresponding to the shaped transformation of m, (1/sigma) (m' - m) where m' is the updated m

Source code in evotorch/algorithms/cmaes.py
def update_m(self, zs: torch.Tensor, ys: torch.Tensor, assigned_weights: torch.Tensor) -> torch.Tensor:
    """Update the center of the search distribution m
    With zs and ys retained from sampling, this operation is O(popsize d), as it involves summing across popsize d-dimensional vectors.
    Args:
        zs (torch.Tensor): A tensor of shape [popsize, d] of samples from the local coordinate space e.g. z_i ~ N(0, I_d)
        ys (torch.Tensor): A tensor of shape [popsize, d] of samples from the shaped coordinate space e.g. y_i ~ N(0, C)
        assigned_weights (torch.Tensor): A [popsize, ] dimensional tensor of ordered weights
    Returns:
        local_m_displacement (torch.Tensor): A tensor of shape [d], corresponding to the local transformation of m,
            (1/sigma) (C^-1/2) (m' - m) where m' is the updated m
        shaped_m_displacement (torch.Tensor): A tensor of shape [d], corresponding to the shaped transformation of m,
            (1/sigma) (m' - m) where m' is the updated m
    """
    # Get the top-mu weights
    top_mu = torch.topk(assigned_weights, k=self.mu)
    top_mu_weights = top_mu.values
    top_mu_indices = top_mu.indices

    # Compute the weighted recombination in local coordinate space
    local_m_displacement = torch.sum(top_mu_weights.unsqueeze(-1) * zs[top_mu_indices], dim=0)
    # Compute the weighted recombination in shaped coordinate space
    shaped_m_displacement = torch.sum(top_mu_weights.unsqueeze(-1) * ys[top_mu_indices], dim=0)

    # Update m
    self.m = self.m + self.c_m * self.sigma * shaped_m_displacement

    # Return the weighted recombinations
    return local_m_displacement, shaped_m_displacement

update_p_c(self, shaped_m_displacement, h_sig)

Update the evolution path for rank-1 update, p_c This operation is bounded O(d), as is simply the sum of vectors

Parameters:

Name Type Description Default
local_m_displacement torch.Tensor

The weighted recombination of shaped samples ys, corresponding to (1/sigma) (m' - m) where m' is the updated m

required
h_sig torch.Tensor

Whether to stall the update based on the evolution path on sigma, p_sigma, expressed as a torch float

required
Source code in evotorch/algorithms/cmaes.py
def update_p_c(self, shaped_m_displacement: torch.Tensor, h_sig: torch.Tensor) -> None:
    """Update the evolution path for rank-1 update, p_c
    This operation is bounded O(d), as is simply the sum of vectors
    Args:
        local_m_displacement (torch.Tensor): The weighted recombination of shaped samples ys, corresponding to
            (1/sigma) (m' - m) where m' is the updated m
        h_sig (torch.Tensor): Whether to stall the update based on the evolution path on sigma, p_sigma, expressed as a torch float
    """
    self.p_c = (1 - self.c_c) * self.p_c + h_sig * self.variance_discount_c * shaped_m_displacement

update_p_sigma(self, local_m_displacement)

Update the evolution path for sigma, p_sigma This operation is bounded O(d), as is simply the sum of vectors

Parameters:

Name Type Description Default
local_m_displacement torch.Tensor

The weighted recombination of local samples zs, corresponding to (1/sigma) (C^-½) (m' - m) where m' is the updated m

required
Source code in evotorch/algorithms/cmaes.py
def update_p_sigma(self, local_m_displacement: torch.Tensor) -> None:
    """Update the evolution path for sigma, p_sigma
    This operation is bounded O(d), as is simply the sum of vectors
    Args:
        local_m_displacement (torch.Tensor): The weighted recombination of local samples zs, corresponding to
            (1/sigma) (C^-1/2) (m' - m) where m' is the updated m
    """
    self.p_sigma = (1 - self.c_sigma) * self.p_sigma + self.variance_discount_sigma * local_m_displacement

update_sigma(self)

Update the step size sigma according to its evolution path p_sigma This operation is bounded O(d), with the most expensive component being the norm of the evolution path, a d-dimensional vector.

Source code in evotorch/algorithms/cmaes.py
def update_sigma(self) -> None:
    """Update the step size sigma according to its evolution path p_sigma
    This operation is bounded O(d), with the most expensive component being the norm of the evolution path, a d-dimensional vector.
    """
    d = self._problem.solution_length
    # Compute the exponential update
    if self.csa_squared:
        # Exponential update based on natural gradient maximizing squared norm of p_sigma
        exponential_update = (torch.norm(self.p_sigma).pow(2.0) / d - 1) / 2
    else:
        # Exponential update increasing likelihood p_sigma having expected norm
        exponential_update = torch.norm(self.p_sigma) / self.unbiased_expectation - 1
    # Rescale exponential update based on learning rate + damping factor
    exponential_update = (self.c_sigma / self.damp_sigma) * exponential_update
    # Multiplicative update to sigma
    self.sigma = self.sigma * torch.exp(exponential_update)