evotorch.optimizers
Optimizers (like Adam or ClipUp) to be used with distributionbased search algorithms.
Adam (TorchOptimizer)
¶
The Adam optimizer.
Reference:
Kingma, D. P. and J. Ba (2015).
Adam: A method for stochastic optimization.
In Proceedings of 3rd International Conference on Learning Representations.
Source code in evotorch/optimizers.py
class Adam(TorchOptimizer):
"""
The Adam optimizer.
Reference:
Kingma, D. P. and J. Ba (2015).
Adam: A method for stochastic optimization.
In Proceedings of 3rd International Conference on Learning Representations.
"""
def __init__(
self,
*,
solution_length: int,
dtype: DType,
device: Device = "cpu",
stepsize: Optional[float] = None,
beta1: Optional[float] = None,
beta2: Optional[float] = None,
epsilon: Optional[float] = None,
amsgrad: Optional[bool] = None,
):
"""
`__init__(...)`: Initialize the Adam optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
device: The device on which the solutions are kept.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
beta1: The beta1 hyperparameter. None means the default.
beta2: The beta2 hyperparameter. None means the default.
epsilon: The epsilon hyperparameters. None means the default.
amsgrad: Whether or not to use the amsgrad behavior.
None means the default behavior.
See `torch.optim.Adam` for details.
"""
config = {}
if stepsize is not None:
config["lr"] = float(stepsize)
if beta1 is None and beta2 is None:
pass # nothing to do
elif beta1 is not None and beta2 is not None:
config["betas"] = (float(beta1), float(beta2))
else:
raise ValueError(
"The arguments beta1 and beta2 were expected"
" as both None, or as both real numbers."
" However, one of them was encountered as None and"
" the other was encountered as something other than None."
)
if epsilon is not None:
config["eps"] = float(epsilon)
if amsgrad is not None:
config["amsgrad"] = bool(amsgrad)
super().__init__(torch.optim.Adam, solution_length=solution_length, dtype=dtype, device=device, config=config)
__init__(self, *, solution_length, dtype, device='cpu', stepsize=None, beta1=None, beta2=None, epsilon=None, amsgrad=None)
special
¶
__init__(...)
: Initialize the Adam optimizer.
Parameters:
Name  Type  Description  Default 

solution_length 
int 
Length of a solution of the problem which is being worked on. 
required 
dtype 
Union[str, torch.dtype, numpy.dtype, Type] 
The dtype of the problem which is being worked on. 
required 
device 
Union[str, torch.device] 
The device on which the solutions are kept. 
'cpu' 
stepsize 
Optional[float] 
The step size (i.e. the learning rate) employed by the optimizer. 
None 
beta1 
Optional[float] 
The beta1 hyperparameter. None means the default. 
None 
beta2 
Optional[float] 
The beta2 hyperparameter. None means the default. 
None 
epsilon 
Optional[float] 
The epsilon hyperparameters. None means the default. 
None 
amsgrad 
Optional[bool] 
Whether or not to use the amsgrad behavior.
None means the default behavior.
See 
None 
Source code in evotorch/optimizers.py
def __init__(
self,
*,
solution_length: int,
dtype: DType,
device: Device = "cpu",
stepsize: Optional[float] = None,
beta1: Optional[float] = None,
beta2: Optional[float] = None,
epsilon: Optional[float] = None,
amsgrad: Optional[bool] = None,
):
"""
`__init__(...)`: Initialize the Adam optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
device: The device on which the solutions are kept.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
beta1: The beta1 hyperparameter. None means the default.
beta2: The beta2 hyperparameter. None means the default.
epsilon: The epsilon hyperparameters. None means the default.
amsgrad: Whether or not to use the amsgrad behavior.
None means the default behavior.
See `torch.optim.Adam` for details.
"""
config = {}
if stepsize is not None:
config["lr"] = float(stepsize)
if beta1 is None and beta2 is None:
pass # nothing to do
elif beta1 is not None and beta2 is not None:
config["betas"] = (float(beta1), float(beta2))
else:
raise ValueError(
"The arguments beta1 and beta2 were expected"
" as both None, or as both real numbers."
" However, one of them was encountered as None and"
" the other was encountered as something other than None."
)
if epsilon is not None:
config["eps"] = float(epsilon)
if amsgrad is not None:
config["amsgrad"] = bool(amsgrad)
super().__init__(torch.optim.Adam, solution_length=solution_length, dtype=dtype, device=device, config=config)
ClipUp
¶
The ClipUp optimizer.
Although this optimizer has the very same interface with SGD and Adam, it is not a PyTorch optimizer. Therefore, it does not inherit from TorchOptimizer.
Reference:
Toklu, N. E., Liskowski, P., & Srivastava, R. K. (2020, September).
ClipUp: A Simple and Powerful Optimizer for DistributionBased Policy Evolution.
In International Conference on Parallel Problem Solving from Nature (pp. 515527).
Springer, Cham.
Source code in evotorch/optimizers.py
class ClipUp:
"""
The ClipUp optimizer.
Although this optimizer has the very same interface with SGD and Adam,
it is not a PyTorch optimizer. Therefore, it does not inherit from
TorchOptimizer.
Reference:
Toklu, N. E., Liskowski, P., & Srivastava, R. K. (2020, September).
ClipUp: A Simple and Powerful Optimizer for DistributionBased Policy Evolution.
In International Conference on Parallel Problem Solving from Nature (pp. 515527).
Springer, Cham.
"""
def __init__(
self,
*,
solution_length: int,
dtype: DType,
stepsize: float,
momentum: float = 0.9,
max_speed: Optional[float] = None,
device: Device = "cpu",
):
"""
`__init__(...)`: Initialize the ClipUp optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
momentum: The momentum coefficient. None means the default.
max_speed: The maximum speed. If given as None, the
`max_speed` will be taken as two times the stepsize.
device: The device on which the solutions are kept.
"""
stepsize = float(stepsize)
momentum = float(momentum)
if max_speed is None:
max_speed = stepsize * 2.0
else:
max_speed = float(max_speed)
solution_length = int(solution_length)
if stepsize < 0.0:
raise ValueError(f"Invalid stepsize: {stepsize}")
if momentum < 0.0 or momentum > 1.0:
raise ValueError(f"Invalid momentum: {momentum}")
if max_speed < 0.0:
raise ValueError(f"Invalid max_speed: {max_speed}")
self._stepsize = stepsize
self._momentum = momentum
self._max_speed = max_speed
self._velocity: Optional[torch.Tensor] = torch.zeros(
solution_length, dtype=to_torch_dtype(dtype), device=device
)
self._dtype = to_torch_dtype(dtype)
self._device = device
@staticmethod
def _clip(x: torch.Tensor, limit: float) > torch.Tensor:
with torch.no_grad():
normx = torch.norm(x)
if normx > limit:
ratio = limit / normx
return x * ratio
else:
return x
@torch.no_grad()
def ascent(self, globalg: RealOrVector, *, cloned_result: bool = True) > torch.Tensor:
"""
Compute the ascent, i.e. the step to follow.
Args:
globalg: The estimated gradient.
cloned_result: If `cloned_result` is True, then the result is a
copy, guaranteed not to be the view of any other tensor
internal to the TorchOptimizer class.
If `cloned_result` is False, then the result is not a copy.
Use `cloned_result=False` only when you are sure that your
algorithm will never do direct modification on the ascent
vector it receives.
Important: if you set `cloned_result=False`, and do inplace
modifications on the returned result of `ascent(...)`, then
the internal velocity of ClipUp will be corrupted!
Returns:
The ascent vector, representing the step to follow.
"""
globalg = ensure_tensor_length_and_dtype(
globalg,
len(self._velocity),
dtype=self._dtype,
device=self._device,
about=f"{type(self).__name__}.ascent",
)
grad = (globalg / torch.norm(globalg)) * self._stepsize
self._velocity = self._clip((self._momentum * self._velocity) + grad, self._max_speed)
result = self._velocity
if cloned_result:
result = result.clone()
return result
__init__(self, *, solution_length, dtype, stepsize, momentum=0.9, max_speed=None, device='cpu')
special
¶
__init__(...)
: Initialize the ClipUp optimizer.
Parameters:
Name  Type  Description  Default 

solution_length 
int 
Length of a solution of the problem which is being worked on. 
required 
dtype 
Union[str, torch.dtype, numpy.dtype, Type] 
The dtype of the problem which is being worked on. 
required 
stepsize 
float 
The step size (i.e. the learning rate) employed by the optimizer. 
required 
momentum 
float 
The momentum coefficient. None means the default. 
0.9 
max_speed 
Optional[float] 
The maximum speed. If given as None, the

None 
device 
Union[str, torch.device] 
The device on which the solutions are kept. 
'cpu' 
Source code in evotorch/optimizers.py
def __init__(
self,
*,
solution_length: int,
dtype: DType,
stepsize: float,
momentum: float = 0.9,
max_speed: Optional[float] = None,
device: Device = "cpu",
):
"""
`__init__(...)`: Initialize the ClipUp optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
momentum: The momentum coefficient. None means the default.
max_speed: The maximum speed. If given as None, the
`max_speed` will be taken as two times the stepsize.
device: The device on which the solutions are kept.
"""
stepsize = float(stepsize)
momentum = float(momentum)
if max_speed is None:
max_speed = stepsize * 2.0
else:
max_speed = float(max_speed)
solution_length = int(solution_length)
if stepsize < 0.0:
raise ValueError(f"Invalid stepsize: {stepsize}")
if momentum < 0.0 or momentum > 1.0:
raise ValueError(f"Invalid momentum: {momentum}")
if max_speed < 0.0:
raise ValueError(f"Invalid max_speed: {max_speed}")
self._stepsize = stepsize
self._momentum = momentum
self._max_speed = max_speed
self._velocity: Optional[torch.Tensor] = torch.zeros(
solution_length, dtype=to_torch_dtype(dtype), device=device
)
self._dtype = to_torch_dtype(dtype)
self._device = device
ascent(self, globalg, *, cloned_result=True)
¶
Compute the ascent, i.e. the step to follow.
Parameters:
Name  Type  Description  Default 

globalg 
Union[float, Iterable[float], torch.Tensor] 
The estimated gradient. 
required 
cloned_result 
bool 
If 
True 
Returns:
Type  Description 

Tensor 
The ascent vector, representing the step to follow. 
Source code in evotorch/optimizers.py
@torch.no_grad()
def ascent(self, globalg: RealOrVector, *, cloned_result: bool = True) > torch.Tensor:
"""
Compute the ascent, i.e. the step to follow.
Args:
globalg: The estimated gradient.
cloned_result: If `cloned_result` is True, then the result is a
copy, guaranteed not to be the view of any other tensor
internal to the TorchOptimizer class.
If `cloned_result` is False, then the result is not a copy.
Use `cloned_result=False` only when you are sure that your
algorithm will never do direct modification on the ascent
vector it receives.
Important: if you set `cloned_result=False`, and do inplace
modifications on the returned result of `ascent(...)`, then
the internal velocity of ClipUp will be corrupted!
Returns:
The ascent vector, representing the step to follow.
"""
globalg = ensure_tensor_length_and_dtype(
globalg,
len(self._velocity),
dtype=self._dtype,
device=self._device,
about=f"{type(self).__name__}.ascent",
)
grad = (globalg / torch.norm(globalg)) * self._stepsize
self._velocity = self._clip((self._momentum * self._velocity) + grad, self._max_speed)
result = self._velocity
if cloned_result:
result = result.clone()
return result
SGD (TorchOptimizer)
¶
The SGD optimizer.
Reference regarding the momentum behavior:
Polyak, B. T. (1964).
Some methods of speeding up the convergence of iteration methods.
USSR Computational Mathematics and Mathematical Physics, 4(5):1–17.
Reference regarding the Nesterov behavior:
Yurii Nesterov (1983).
A method for unconstrained convex minimization problem with the rate ofconvergence o(1/k2).
Doklady ANSSSR (translated as Soviet.Math.Docl.), 269:543–547.
Source code in evotorch/optimizers.py
class SGD(TorchOptimizer):
"""
The SGD optimizer.
Reference regarding the momentum behavior:
Polyak, B. T. (1964).
Some methods of speeding up the convergence of iteration methods.
USSR Computational Mathematics and Mathematical Physics, 4(5):1–17.
Reference regarding the Nesterov behavior:
Yurii Nesterov (1983).
A method for unconstrained convex minimization problem with the rate ofconvergence o(1/k2).
Doklady ANSSSR (translated as Soviet.Math.Docl.), 269:543–547.
"""
def __init__(
self,
*,
solution_length: int,
dtype: DType,
stepsize: float,
device: Device = "cpu",
momentum: Optional[float] = None,
dampening: Optional[bool] = None,
nesterov: Optional[bool] = None,
):
"""
`__init__(...)`: Initialize the SGD optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
device: The device on which the solutions are kept.
momentum: The momentum coefficient. None means the default.
dampening: Whether or not to activate the dampening behavior.
None means the default.
See `torch.optim.SGD` for details.
nesterov: Whether or not to activate the nesterov behavior.
None means the default.
See `torch.optim.SGD` for details.
"""
config = {}
config["lr"] = float(stepsize)
if momentum is not None:
config["momentum"] = float(momentum)
if dampening is not None:
config["dampening"] = float(dampening)
if nesterov is not None:
config["nesterov"] = bool(nesterov)
super().__init__(torch.optim.SGD, solution_length=solution_length, dtype=dtype, device=device, config=config)
__init__(self, *, solution_length, dtype, stepsize, device='cpu', momentum=None, dampening=None, nesterov=None)
special
¶
__init__(...)
: Initialize the SGD optimizer.
Parameters:
Name  Type  Description  Default 

solution_length 
int 
Length of a solution of the problem which is being worked on. 
required 
dtype 
Union[str, torch.dtype, numpy.dtype, Type] 
The dtype of the problem which is being worked on. 
required 
stepsize 
float 
The step size (i.e. the learning rate) employed by the optimizer. 
required 
device 
Union[str, torch.device] 
The device on which the solutions are kept. 
'cpu' 
momentum 
Optional[float] 
The momentum coefficient. None means the default. 
None 
dampening 
Optional[bool] 
Whether or not to activate the dampening behavior.
None means the default.
See 
None 
nesterov 
Optional[bool] 
Whether or not to activate the nesterov behavior.
None means the default.
See 
None 
Source code in evotorch/optimizers.py
def __init__(
self,
*,
solution_length: int,
dtype: DType,
stepsize: float,
device: Device = "cpu",
momentum: Optional[float] = None,
dampening: Optional[bool] = None,
nesterov: Optional[bool] = None,
):
"""
`__init__(...)`: Initialize the SGD optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
device: The device on which the solutions are kept.
momentum: The momentum coefficient. None means the default.
dampening: Whether or not to activate the dampening behavior.
None means the default.
See `torch.optim.SGD` for details.
nesterov: Whether or not to activate the nesterov behavior.
None means the default.
See `torch.optim.SGD` for details.
"""
config = {}
config["lr"] = float(stepsize)
if momentum is not None:
config["momentum"] = float(momentum)
if dampening is not None:
config["dampening"] = float(dampening)
if nesterov is not None:
config["nesterov"] = bool(nesterov)
super().__init__(torch.optim.SGD, solution_length=solution_length, dtype=dtype, device=device, config=config)
TorchOptimizer
¶
Base class for using a PyTorch optimizer
Source code in evotorch/optimizers.py
class TorchOptimizer:
"""
Base class for using a PyTorch optimizer
"""
def __init__(
self,
torch_optimizer: Type,
*,
config: dict,
solution_length: int,
dtype: DType,
device: Device = "cpu",
):
"""
`__init__(...)`: Initialize the TorchOptimizer.
Args:
torch_optimizer: The class which represents a PyTorch optimizer.
config: The configuration dictionary to be passed to the optimizer
as keyword arguments.
solution_length: Length of a solution of the problem on which the
optimizer will work.
dtype: The dtype of the problem.
device: The device on which the solutions are kept.
"""
self._data = torch.empty(int(solution_length), dtype=to_torch_dtype(dtype), device=device)
self._optim = torch_optimizer([self._data], **config)
@torch.no_grad()
def ascent(self, globalg: RealOrVector, *, cloned_result: bool = True) > torch.Tensor:
"""
Compute the ascent, i.e. the step to follow.
Args:
globalg: The estimated gradient.
cloned_result: If `cloned_result` is True, then the result is a
copy, guaranteed not to be the view of any other tensor
internal to the TorchOptimizer class.
If `cloned_result` is False, then the result is not a copy.
Use `cloned_result=False` only when you are sure that your
algorithm will never do direct modification on the ascent
vector it receives.
Returns:
The ascent vector, representing the step to follow.
"""
globalg = ensure_tensor_length_and_dtype(
globalg,
len(self._data),
dtype=self._data.dtype,
device=self._data.device,
about=f"{type(self).__name__}.ascent",
)
self._data.zero_()
self._data.grad = globalg
self._optim.step()
result = 1.0 * self._data
return result
__init__(self, torch_optimizer, *, config, solution_length, dtype, device='cpu')
special
¶
__init__(...)
: Initialize the TorchOptimizer.
Parameters:
Name  Type  Description  Default 

torch_optimizer 
Type 
The class which represents a PyTorch optimizer. 
required 
config 
dict 
The configuration dictionary to be passed to the optimizer as keyword arguments. 
required 
solution_length 
int 
Length of a solution of the problem on which the optimizer will work. 
required 
dtype 
Union[str, torch.dtype, numpy.dtype, Type] 
The dtype of the problem. 
required 
device 
Union[str, torch.device] 
The device on which the solutions are kept. 
'cpu' 
Source code in evotorch/optimizers.py
def __init__(
self,
torch_optimizer: Type,
*,
config: dict,
solution_length: int,
dtype: DType,
device: Device = "cpu",
):
"""
`__init__(...)`: Initialize the TorchOptimizer.
Args:
torch_optimizer: The class which represents a PyTorch optimizer.
config: The configuration dictionary to be passed to the optimizer
as keyword arguments.
solution_length: Length of a solution of the problem on which the
optimizer will work.
dtype: The dtype of the problem.
device: The device on which the solutions are kept.
"""
self._data = torch.empty(int(solution_length), dtype=to_torch_dtype(dtype), device=device)
self._optim = torch_optimizer([self._data], **config)
ascent(self, globalg, *, cloned_result=True)
¶
Compute the ascent, i.e. the step to follow.
Parameters:
Name  Type  Description  Default 

globalg 
Union[float, Iterable[float], torch.Tensor] 
The estimated gradient. 
required 
cloned_result 
bool 
If 
True 
Returns:
Type  Description 

Tensor 
The ascent vector, representing the step to follow. 
Source code in evotorch/optimizers.py
@torch.no_grad()
def ascent(self, globalg: RealOrVector, *, cloned_result: bool = True) > torch.Tensor:
"""
Compute the ascent, i.e. the step to follow.
Args:
globalg: The estimated gradient.
cloned_result: If `cloned_result` is True, then the result is a
copy, guaranteed not to be the view of any other tensor
internal to the TorchOptimizer class.
If `cloned_result` is False, then the result is not a copy.
Use `cloned_result=False` only when you are sure that your
algorithm will never do direct modification on the ascent
vector it receives.
Returns:
The ascent vector, representing the step to follow.
"""
globalg = ensure_tensor_length_and_dtype(
globalg,
len(self._data),
dtype=self._data.dtype,
device=self._data.device,
about=f"{type(self).__name__}.ascent",
)
self._data.zero_()
self._data.grad = globalg
self._optim.step()
result = 1.0 * self._data
return result
get_optimizer_class(s, optimizer_config=None)
¶
Get the optimizer class from the given string.
Parameters:
Name  Type  Description  Default 

s 
str 
A string, referring to the optimizer class. "clipsgd", "clipsga", "clipup" refers to ClipUp. "adam" refers to Adam. "sgd" or "sga" refers to SGD. 
required 
optimizer_config 
Optional[dict] 
A dictionary containing the configurations to be passed to the optimizer. If this argument is not None, then, instead of the class being referred to, a dynamically generated factory function will be returned, which will pass these configurations to the actual class upon being called. 
None 
Returns:
Type  Description 

Callable 
The class, or a factory function instantiating that class. 
Source code in evotorch/optimizers.py
def get_optimizer_class(s: str, optimizer_config: Optional[dict] = None) > Callable:
"""
Get the optimizer class from the given string.
Args:
s: A string, referring to the optimizer class.
"clipsgd", "clipsga", "clipup" refers to ClipUp.
"adam" refers to Adam.
"sgd" or "sga" refers to SGD.
optimizer_config: A dictionary containing the configurations to be
passed to the optimizer. If this argument is not None,
then, instead of the class being referred to, a dynamically
generated factory function will be returned, which will pass
these configurations to the actual class upon being called.
Returns:
The class, or a factory function instantiating that class.
"""
if s in ("clipsgd", "clipsga", "clipup"):
cls = ClipUp
elif s == "adam":
cls = Adam
elif s in ("sgd", "sga"):
cls = SGD
else:
raise ValueError(f"Unknown optimizer: {repr(s)}")
if optimizer_config is None:
return cls
else:
def f(*args, **kwargs):
nonlocal cls, optimizer_config
conf = {}
conf.update(optimizer_config)
conf.update(kwargs)
return cls(*args, **conf)
return f