evotorch.optimizers
Optimizers (like Adam or ClipUp) to be used with distribution-based search algorithms.
Adam (TorchOptimizer)
¶
The Adam optimizer.
Reference:
Kingma, D. P. and J. Ba (2015).
Adam: A method for stochastic optimization.
In Proceedings of 3rd International Conference on Learning Representations.
Source code in evotorch/optimizers.py
class Adam(TorchOptimizer):
"""
The Adam optimizer.
Reference:
Kingma, D. P. and J. Ba (2015).
Adam: A method for stochastic optimization.
In Proceedings of 3rd International Conference on Learning Representations.
"""
def __init__(
self,
*,
solution_length: int,
dtype: DType,
device: Device = "cpu",
stepsize: Optional[float] = None,
beta1: Optional[float] = None,
beta2: Optional[float] = None,
epsilon: Optional[float] = None,
amsgrad: Optional[bool] = None,
):
"""
`__init__(...)`: Initialize the Adam optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
device: The device on which the solutions are kept.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
beta1: The beta1 hyperparameter. None means the default.
beta2: The beta2 hyperparameter. None means the default.
epsilon: The epsilon hyperparameters. None means the default.
amsgrad: Whether or not to use the amsgrad behavior.
None means the default behavior.
See `torch.optim.Adam` for details.
"""
config = {}
if stepsize is not None:
config["lr"] = float(stepsize)
if beta1 is None and beta2 is None:
pass # nothing to do
elif beta1 is not None and beta2 is not None:
config["betas"] = (float(beta1), float(beta2))
else:
raise ValueError(
"The arguments beta1 and beta2 were expected"
" as both None, or as both real numbers."
" However, one of them was encountered as None and"
" the other was encountered as something other than None."
)
if epsilon is not None:
config["eps"] = float(epsilon)
if amsgrad is not None:
config["amsgrad"] = bool(amsgrad)
super().__init__(torch.optim.Adam, solution_length=solution_length, dtype=dtype, device=device, config=config)
__init__(self, *, solution_length, dtype, device='cpu', stepsize=None, beta1=None, beta2=None, epsilon=None, amsgrad=None)
special
¶
__init__(...)
: Initialize the Adam optimizer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
solution_length |
int |
Length of a solution of the problem which is being worked on. |
required |
dtype |
Union[str, torch.dtype, numpy.dtype, Type] |
The dtype of the problem which is being worked on. |
required |
device |
Union[str, torch.device] |
The device on which the solutions are kept. |
'cpu' |
stepsize |
Optional[float] |
The step size (i.e. the learning rate) employed by the optimizer. |
None |
beta1 |
Optional[float] |
The beta1 hyperparameter. None means the default. |
None |
beta2 |
Optional[float] |
The beta2 hyperparameter. None means the default. |
None |
epsilon |
Optional[float] |
The epsilon hyperparameters. None means the default. |
None |
amsgrad |
Optional[bool] |
Whether or not to use the amsgrad behavior.
None means the default behavior.
See |
None |
Source code in evotorch/optimizers.py
def __init__(
self,
*,
solution_length: int,
dtype: DType,
device: Device = "cpu",
stepsize: Optional[float] = None,
beta1: Optional[float] = None,
beta2: Optional[float] = None,
epsilon: Optional[float] = None,
amsgrad: Optional[bool] = None,
):
"""
`__init__(...)`: Initialize the Adam optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
device: The device on which the solutions are kept.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
beta1: The beta1 hyperparameter. None means the default.
beta2: The beta2 hyperparameter. None means the default.
epsilon: The epsilon hyperparameters. None means the default.
amsgrad: Whether or not to use the amsgrad behavior.
None means the default behavior.
See `torch.optim.Adam` for details.
"""
config = {}
if stepsize is not None:
config["lr"] = float(stepsize)
if beta1 is None and beta2 is None:
pass # nothing to do
elif beta1 is not None and beta2 is not None:
config["betas"] = (float(beta1), float(beta2))
else:
raise ValueError(
"The arguments beta1 and beta2 were expected"
" as both None, or as both real numbers."
" However, one of them was encountered as None and"
" the other was encountered as something other than None."
)
if epsilon is not None:
config["eps"] = float(epsilon)
if amsgrad is not None:
config["amsgrad"] = bool(amsgrad)
super().__init__(torch.optim.Adam, solution_length=solution_length, dtype=dtype, device=device, config=config)
ClipUp
¶
The ClipUp optimizer.
Although this optimizer has the very same interface with SGD and Adam, it is not a PyTorch optimizer. Therefore, it does not inherit from TorchOptimizer.
Reference:
Toklu, N. E., Liskowski, P., & Srivastava, R. K. (2020, September).
ClipUp: A Simple and Powerful Optimizer for Distribution-Based Policy Evolution.
In International Conference on Parallel Problem Solving from Nature (pp. 515-527).
Springer, Cham.
Source code in evotorch/optimizers.py
class ClipUp:
"""
The ClipUp optimizer.
Although this optimizer has the very same interface with SGD and Adam,
it is not a PyTorch optimizer. Therefore, it does not inherit from
TorchOptimizer.
Reference:
Toklu, N. E., Liskowski, P., & Srivastava, R. K. (2020, September).
ClipUp: A Simple and Powerful Optimizer for Distribution-Based Policy Evolution.
In International Conference on Parallel Problem Solving from Nature (pp. 515-527).
Springer, Cham.
"""
def __init__(
self,
*,
solution_length: int,
dtype: DType,
stepsize: float,
momentum: float = 0.9,
max_speed: Optional[float] = None,
device: Device = "cpu",
):
"""
`__init__(...)`: Initialize the ClipUp optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
momentum: The momentum coefficient. None means the default.
max_speed: The maximum speed. If given as None, the
`max_speed` will be taken as two times the stepsize.
device: The device on which the solutions are kept.
"""
stepsize = float(stepsize)
momentum = float(momentum)
if max_speed is None:
max_speed = stepsize * 2.0
else:
max_speed = float(max_speed)
solution_length = int(solution_length)
if stepsize < 0.0:
raise ValueError(f"Invalid stepsize: {stepsize}")
if momentum < 0.0 or momentum > 1.0:
raise ValueError(f"Invalid momentum: {momentum}")
if max_speed < 0.0:
raise ValueError(f"Invalid max_speed: {max_speed}")
self._stepsize = stepsize
self._momentum = momentum
self._max_speed = max_speed
self._velocity: Optional[torch.Tensor] = torch.zeros(
solution_length, dtype=to_torch_dtype(dtype), device=device
)
self._dtype = to_torch_dtype(dtype)
self._device = device
@staticmethod
def _clip(x: torch.Tensor, limit: float) -> torch.Tensor:
with torch.no_grad():
normx = torch.norm(x)
if normx > limit:
ratio = limit / normx
return x * ratio
else:
return x
@torch.no_grad()
def ascent(self, globalg: RealOrVector, *, cloned_result: bool = True) -> torch.Tensor:
"""
Compute the ascent, i.e. the step to follow.
Args:
globalg: The estimated gradient.
cloned_result: If `cloned_result` is True, then the result is a
copy, guaranteed not to be the view of any other tensor
internal to the TorchOptimizer class.
If `cloned_result` is False, then the result is not a copy.
Use `cloned_result=False` only when you are sure that your
algorithm will never do direct modification on the ascent
vector it receives.
Important: if you set `cloned_result=False`, and do in-place
modifications on the returned result of `ascent(...)`, then
the internal velocity of ClipUp will be corrupted!
Returns:
The ascent vector, representing the step to follow.
"""
globalg = ensure_tensor_length_and_dtype(
globalg,
len(self._velocity),
dtype=self._dtype,
device=self._device,
about=f"{type(self).__name__}.ascent",
)
grad = (globalg / torch.norm(globalg)) * self._stepsize
self._velocity = self._clip((self._momentum * self._velocity) + grad, self._max_speed)
result = self._velocity
if cloned_result:
result = result.clone()
return result
__init__(self, *, solution_length, dtype, stepsize, momentum=0.9, max_speed=None, device='cpu')
special
¶
__init__(...)
: Initialize the ClipUp optimizer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
solution_length |
int |
Length of a solution of the problem which is being worked on. |
required |
dtype |
Union[str, torch.dtype, numpy.dtype, Type] |
The dtype of the problem which is being worked on. |
required |
stepsize |
float |
The step size (i.e. the learning rate) employed by the optimizer. |
required |
momentum |
float |
The momentum coefficient. None means the default. |
0.9 |
max_speed |
Optional[float] |
The maximum speed. If given as None, the
|
None |
device |
Union[str, torch.device] |
The device on which the solutions are kept. |
'cpu' |
Source code in evotorch/optimizers.py
def __init__(
self,
*,
solution_length: int,
dtype: DType,
stepsize: float,
momentum: float = 0.9,
max_speed: Optional[float] = None,
device: Device = "cpu",
):
"""
`__init__(...)`: Initialize the ClipUp optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
momentum: The momentum coefficient. None means the default.
max_speed: The maximum speed. If given as None, the
`max_speed` will be taken as two times the stepsize.
device: The device on which the solutions are kept.
"""
stepsize = float(stepsize)
momentum = float(momentum)
if max_speed is None:
max_speed = stepsize * 2.0
else:
max_speed = float(max_speed)
solution_length = int(solution_length)
if stepsize < 0.0:
raise ValueError(f"Invalid stepsize: {stepsize}")
if momentum < 0.0 or momentum > 1.0:
raise ValueError(f"Invalid momentum: {momentum}")
if max_speed < 0.0:
raise ValueError(f"Invalid max_speed: {max_speed}")
self._stepsize = stepsize
self._momentum = momentum
self._max_speed = max_speed
self._velocity: Optional[torch.Tensor] = torch.zeros(
solution_length, dtype=to_torch_dtype(dtype), device=device
)
self._dtype = to_torch_dtype(dtype)
self._device = device
ascent(self, globalg, *, cloned_result=True)
¶
Compute the ascent, i.e. the step to follow.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
globalg |
Union[float, Iterable[float], torch.Tensor] |
The estimated gradient. |
required |
cloned_result |
bool |
If |
True |
Returns:
Type | Description |
---|---|
Tensor |
The ascent vector, representing the step to follow. |
Source code in evotorch/optimizers.py
@torch.no_grad()
def ascent(self, globalg: RealOrVector, *, cloned_result: bool = True) -> torch.Tensor:
"""
Compute the ascent, i.e. the step to follow.
Args:
globalg: The estimated gradient.
cloned_result: If `cloned_result` is True, then the result is a
copy, guaranteed not to be the view of any other tensor
internal to the TorchOptimizer class.
If `cloned_result` is False, then the result is not a copy.
Use `cloned_result=False` only when you are sure that your
algorithm will never do direct modification on the ascent
vector it receives.
Important: if you set `cloned_result=False`, and do in-place
modifications on the returned result of `ascent(...)`, then
the internal velocity of ClipUp will be corrupted!
Returns:
The ascent vector, representing the step to follow.
"""
globalg = ensure_tensor_length_and_dtype(
globalg,
len(self._velocity),
dtype=self._dtype,
device=self._device,
about=f"{type(self).__name__}.ascent",
)
grad = (globalg / torch.norm(globalg)) * self._stepsize
self._velocity = self._clip((self._momentum * self._velocity) + grad, self._max_speed)
result = self._velocity
if cloned_result:
result = result.clone()
return result
SGD (TorchOptimizer)
¶
The SGD optimizer.
Reference regarding the momentum behavior:
Polyak, B. T. (1964).
Some methods of speeding up the convergence of iteration methods.
USSR Computational Mathematics and Mathematical Physics, 4(5):1–17.
Reference regarding the Nesterov behavior:
Yurii Nesterov (1983).
A method for unconstrained convex minimization problem with the rate ofconvergence o(1/k2).
Doklady ANSSSR (translated as Soviet.Math.Docl.), 269:543–547.
Source code in evotorch/optimizers.py
class SGD(TorchOptimizer):
"""
The SGD optimizer.
Reference regarding the momentum behavior:
Polyak, B. T. (1964).
Some methods of speeding up the convergence of iteration methods.
USSR Computational Mathematics and Mathematical Physics, 4(5):1–17.
Reference regarding the Nesterov behavior:
Yurii Nesterov (1983).
A method for unconstrained convex minimization problem with the rate ofconvergence o(1/k2).
Doklady ANSSSR (translated as Soviet.Math.Docl.), 269:543–547.
"""
def __init__(
self,
*,
solution_length: int,
dtype: DType,
stepsize: float,
device: Device = "cpu",
momentum: Optional[float] = None,
dampening: Optional[bool] = None,
nesterov: Optional[bool] = None,
):
"""
`__init__(...)`: Initialize the SGD optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
device: The device on which the solutions are kept.
momentum: The momentum coefficient. None means the default.
dampening: Whether or not to activate the dampening behavior.
None means the default.
See `torch.optim.SGD` for details.
nesterov: Whether or not to activate the nesterov behavior.
None means the default.
See `torch.optim.SGD` for details.
"""
config = {}
config["lr"] = float(stepsize)
if momentum is not None:
config["momentum"] = float(momentum)
if dampening is not None:
config["dampening"] = float(dampening)
if nesterov is not None:
config["nesterov"] = bool(nesterov)
super().__init__(torch.optim.SGD, solution_length=solution_length, dtype=dtype, device=device, config=config)
__init__(self, *, solution_length, dtype, stepsize, device='cpu', momentum=None, dampening=None, nesterov=None)
special
¶
__init__(...)
: Initialize the SGD optimizer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
solution_length |
int |
Length of a solution of the problem which is being worked on. |
required |
dtype |
Union[str, torch.dtype, numpy.dtype, Type] |
The dtype of the problem which is being worked on. |
required |
stepsize |
float |
The step size (i.e. the learning rate) employed by the optimizer. |
required |
device |
Union[str, torch.device] |
The device on which the solutions are kept. |
'cpu' |
momentum |
Optional[float] |
The momentum coefficient. None means the default. |
None |
dampening |
Optional[bool] |
Whether or not to activate the dampening behavior.
None means the default.
See |
None |
nesterov |
Optional[bool] |
Whether or not to activate the nesterov behavior.
None means the default.
See |
None |
Source code in evotorch/optimizers.py
def __init__(
self,
*,
solution_length: int,
dtype: DType,
stepsize: float,
device: Device = "cpu",
momentum: Optional[float] = None,
dampening: Optional[bool] = None,
nesterov: Optional[bool] = None,
):
"""
`__init__(...)`: Initialize the SGD optimizer.
Args:
solution_length: Length of a solution of the problem which is
being worked on.
dtype: The dtype of the problem which is being worked on.
stepsize: The step size (i.e. the learning rate) employed
by the optimizer.
device: The device on which the solutions are kept.
momentum: The momentum coefficient. None means the default.
dampening: Whether or not to activate the dampening behavior.
None means the default.
See `torch.optim.SGD` for details.
nesterov: Whether or not to activate the nesterov behavior.
None means the default.
See `torch.optim.SGD` for details.
"""
config = {}
config["lr"] = float(stepsize)
if momentum is not None:
config["momentum"] = float(momentum)
if dampening is not None:
config["dampening"] = float(dampening)
if nesterov is not None:
config["nesterov"] = bool(nesterov)
super().__init__(torch.optim.SGD, solution_length=solution_length, dtype=dtype, device=device, config=config)
TorchOptimizer
¶
Base class for using a PyTorch optimizer
Source code in evotorch/optimizers.py
class TorchOptimizer:
"""
Base class for using a PyTorch optimizer
"""
def __init__(
self,
torch_optimizer: Type,
*,
config: dict,
solution_length: int,
dtype: DType,
device: Device = "cpu",
):
"""
`__init__(...)`: Initialize the TorchOptimizer.
Args:
torch_optimizer: The class which represents a PyTorch optimizer.
config: The configuration dictionary to be passed to the optimizer
as keyword arguments.
solution_length: Length of a solution of the problem on which the
optimizer will work.
dtype: The dtype of the problem.
device: The device on which the solutions are kept.
"""
self._data = torch.empty(int(solution_length), dtype=to_torch_dtype(dtype), device=device)
self._optim = torch_optimizer([self._data], **config)
@torch.no_grad()
def ascent(self, globalg: RealOrVector, *, cloned_result: bool = True) -> torch.Tensor:
"""
Compute the ascent, i.e. the step to follow.
Args:
globalg: The estimated gradient.
cloned_result: If `cloned_result` is True, then the result is a
copy, guaranteed not to be the view of any other tensor
internal to the TorchOptimizer class.
If `cloned_result` is False, then the result is not a copy.
Use `cloned_result=False` only when you are sure that your
algorithm will never do direct modification on the ascent
vector it receives.
Returns:
The ascent vector, representing the step to follow.
"""
globalg = ensure_tensor_length_and_dtype(
globalg,
len(self._data),
dtype=self._data.dtype,
device=self._data.device,
about=f"{type(self).__name__}.ascent",
)
self._data.zero_()
self._data.grad = globalg
self._optim.step()
result = -1.0 * self._data
return result
__init__(self, torch_optimizer, *, config, solution_length, dtype, device='cpu')
special
¶
__init__(...)
: Initialize the TorchOptimizer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
torch_optimizer |
Type |
The class which represents a PyTorch optimizer. |
required |
config |
dict |
The configuration dictionary to be passed to the optimizer as keyword arguments. |
required |
solution_length |
int |
Length of a solution of the problem on which the optimizer will work. |
required |
dtype |
Union[str, torch.dtype, numpy.dtype, Type] |
The dtype of the problem. |
required |
device |
Union[str, torch.device] |
The device on which the solutions are kept. |
'cpu' |
Source code in evotorch/optimizers.py
def __init__(
self,
torch_optimizer: Type,
*,
config: dict,
solution_length: int,
dtype: DType,
device: Device = "cpu",
):
"""
`__init__(...)`: Initialize the TorchOptimizer.
Args:
torch_optimizer: The class which represents a PyTorch optimizer.
config: The configuration dictionary to be passed to the optimizer
as keyword arguments.
solution_length: Length of a solution of the problem on which the
optimizer will work.
dtype: The dtype of the problem.
device: The device on which the solutions are kept.
"""
self._data = torch.empty(int(solution_length), dtype=to_torch_dtype(dtype), device=device)
self._optim = torch_optimizer([self._data], **config)
ascent(self, globalg, *, cloned_result=True)
¶
Compute the ascent, i.e. the step to follow.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
globalg |
Union[float, Iterable[float], torch.Tensor] |
The estimated gradient. |
required |
cloned_result |
bool |
If |
True |
Returns:
Type | Description |
---|---|
Tensor |
The ascent vector, representing the step to follow. |
Source code in evotorch/optimizers.py
@torch.no_grad()
def ascent(self, globalg: RealOrVector, *, cloned_result: bool = True) -> torch.Tensor:
"""
Compute the ascent, i.e. the step to follow.
Args:
globalg: The estimated gradient.
cloned_result: If `cloned_result` is True, then the result is a
copy, guaranteed not to be the view of any other tensor
internal to the TorchOptimizer class.
If `cloned_result` is False, then the result is not a copy.
Use `cloned_result=False` only when you are sure that your
algorithm will never do direct modification on the ascent
vector it receives.
Returns:
The ascent vector, representing the step to follow.
"""
globalg = ensure_tensor_length_and_dtype(
globalg,
len(self._data),
dtype=self._data.dtype,
device=self._data.device,
about=f"{type(self).__name__}.ascent",
)
self._data.zero_()
self._data.grad = globalg
self._optim.step()
result = -1.0 * self._data
return result
get_optimizer_class(s, optimizer_config=None)
¶
Get the optimizer class from the given string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
str |
A string, referring to the optimizer class. "clipsgd", "clipsga", "clipup" refers to ClipUp. "adam" refers to Adam. "sgd" or "sga" refers to SGD. |
required |
optimizer_config |
Optional[dict] |
A dictionary containing the configurations to be passed to the optimizer. If this argument is not None, then, instead of the class being referred to, a dynamically generated factory function will be returned, which will pass these configurations to the actual class upon being called. |
None |
Returns:
Type | Description |
---|---|
Callable |
The class, or a factory function instantiating that class. |
Source code in evotorch/optimizers.py
def get_optimizer_class(s: str, optimizer_config: Optional[dict] = None) -> Callable:
"""
Get the optimizer class from the given string.
Args:
s: A string, referring to the optimizer class.
"clipsgd", "clipsga", "clipup" refers to ClipUp.
"adam" refers to Adam.
"sgd" or "sga" refers to SGD.
optimizer_config: A dictionary containing the configurations to be
passed to the optimizer. If this argument is not None,
then, instead of the class being referred to, a dynamically
generated factory function will be returned, which will pass
these configurations to the actual class upon being called.
Returns:
The class, or a factory function instantiating that class.
"""
if s in ("clipsgd", "clipsga", "clipup"):
cls = ClipUp
elif s == "adam":
cls = Adam
elif s in ("sgd", "sga"):
cls = SGD
else:
raise ValueError(f"Unknown optimizer: {repr(s)}")
if optimizer_config is None:
return cls
else:
def f(*args, **kwargs):
nonlocal cls, optimizer_config
conf = {}
conf.update(optimizer_config)
conf.update(kwargs)
return cls(*args, **conf)
return f