Various utilities for neuroevolution
This namespace contains the GymNE
GymNE (NEProblem)
Representation of a NeuroevolutionProblem where the goal is to maximize
the total reward obtained in a gym
class GymNE(NEProblem):
Representation of a NeuroevolutionProblem where the goal is to maximize
the total reward obtained in a `gym` environment.
def __init__(
env_name: str,
network: Union[str, nn.Module, Callable[[], nn.Module]],
network_args: Optional[dict] = None,
env_config: Optional[Mapping] = None,
observation_normalization: bool = False,
num_episodes: int = 1,
episode_length: Optional[int] = None,
decrease_rewards_by: Optional[float] = None,
num_actors: Optional[Union[int, str]] = "max",
actor_config: Optional[dict] = None,
num_subbatches: Optional[int] = None,
subbatch_size: Optional[int] = None,
initial_bounds: Optional[BoundsPairLike] = (-0.00001, 0.00001),
`__init__(...)`: Initialize the GymNE.
env_name: Name of the `gym` environment.
network: A network structure string, or a Callable (which can be
a class inheriting from `torch.nn.Module`, or a function
which returns a `torch.nn.Module` instance), or an instance
of `torch.nn.Module`.
The object provided here determines the structure of the
neural network policy whose parameters will be evolved.
A network structure string is a string which can be processed
by ``.
Please see the documentation of the function
`` to see how such
a neural network structure string looks like.
network_args: Optionally a dict-like object, storing keyword
arguments to be passed to the network while instantiating it.
env_config: Keyword arguments to pass to `gym.make(...)` while
creating the `gym` environment.
observation_normalization: Whether or not to do online observation
num_episodes: Number of episodes over which a single solution will
be evaluated.
episode_length: Maximum amount of simulator interactions allowed
in a single episode. If left as None, whether or not an episode
is terminated is determined only by the `gym` environment
decrease_rewards_by: Some gym env.s are defined in such a way that
the agent gets a constant reward for each timestep
it survives. This constant reward can also be called
"survival bonus". Such a rewarding scheme can lead the
evolution to local optima where the agent does nothing
but does not die either, just to collect the survival
bonuses. To prevent this, it can be desired to
remove the survival bonuses from each reward obtained.
If this is the case with the problem at hand,
the user can set the argument `decrease_rewards_by`
to a positive float number, and that number will
be subtracted from each reward.
num_actors: Number of actors to create for parallelized
evaluation of the solutions.
One can also set this as "max", which means that
an actor will be created on each available CPU.
When the parallelization is enabled each actor will have its
own instance of the `gym` environment.
In the case of `GymNE`, the default value for this argument
is "max", which means there will be full parallelization,
utilizing all the available CPUs.
actor_config: A dictionary, representing the keyword arguments
to be passed to the options(...) used when creating the
ray actor objects. To be used for explicitly allocating
resources per each actor.
For example, for declaring that each actor is to use a GPU,
one can pass `actor_config=dict(num_gpus=1)`.
Can also be given as None (which is the default),
if no such options are to be passed.
num_subbatches: If `num_subbatches` is None (assuming that
`subbatch_size` is also None), then, when evaluating a
population, the population will be split into n pieces, `n`
being the number of actors, and each actor will evaluate
its assigned piece. If `num_subbatches` is an integer `m`,
then the population will be split into `m` pieces,
and actors will continually accept the next unevaluated
piece as they finish their current tasks.
The arguments `num_subbatches` and `subbatch_size` cannot
be given values other than None at the same time.
subbatch_size: If `subbatch_size` is None (assuming that
`num_subbatches` is also None), then, when evaluating a
population, the population will be split into `n` pieces, `n`
being the number of actors, and each actor will evaluate its
assigned piece. If `subbatch_size` is an integer `m`,
then the population will be split into pieces of size `m`,
and actors will continually accept the next unevaluated
piece as they finish their current tasks.
When there can be significant difference across the solutions
in terms of computational requirements, specifying a
`subbatch_size` can be beneficial, because, while one
actor is busy with a subbatch containing computationally
challenging solutions, other actors can accept more
tasks and save time.
The arguments `num_subbatches` and `subbatch_size` cannot
be given values other than None at the same time.
initial_bounds: Specifies an interval from which the values of the
initial policy parameters will be drawn.
# Store various environment information
self._env_name = env_name
self._env_config = {} if env_config is None else deepcopy(dict(env_config))
self._decrease_rewards_by = 0.0 if decrease_rewards_by is None else float(decrease_rewards_by)
self._observation_normalization = bool(observation_normalization)
self._num_episodes = int(num_episodes)
self._episode_length = None if episode_length is None else int(episode_length)
self._info_keys = dict(cumulative_reward="avg", interaction_count="sum")
self._env: Optional[gym.Env] = None
self._obs_stats: Optional[RunningStat] = None
self._collected_stats: Optional[RunningStat] = None
# Create a temporary environment to read its dimensions
tmp_env = gym.make(self._env_name, **(self._env_config))
# Store the temporary environment's dimensions
self._obs_length = len(tmp_env.observation_space.low)
if isinstance(tmp_env.action_space, gym.spaces.Discrete):
self._act_length = tmp_env.action_space.n
self._act_length = len(tmp_env.action_space.low)
self._obs_shape = tmp_env.observation_space.low.shape
# Validate the space types of the environment
if self._observation_normalization:
self._obs_stats = RunningStat()
self._collected_stats = RunningStat()
self._obs_stats = None
self._collected_stats = None
self._interaction_count: int = 0
self._episode_count: int = 0
objective_sense="max", # RL is maximization
network=network, # Using the policy as the network
def _network_constants(self) -> dict:
return {"obs_length": self._obs_length, "act_length": self._act_length, "obs_space": self._obs_shape}
def _get_env(self) -> gym.Env:
if self._env is None:
self._env = gym.make(self._env_name, **(self._env_config))
return self._env
def _normalize_observation(self, observation: Iterable, *, update_stats: bool = True) -> Iterable:
observation = np.asarray(observation, dtype="float32")
if self.observation_normalization:
if update_stats:
return self._obs_stats.normalize(observation)
return observation
def _use_policy(self, observation: Iterable, policy: nn.Module) -> Iterable:
with torch.no_grad():
result = policy(torch.as_tensor(observation, dtype=torch.float32, device="cpu")).numpy()
env = self._get_env()
if isinstance(env.action_space, gym.spaces.Discrete):
result = np.argmax(result)
elif isinstance(env.action_space, gym.spaces.Box):
result = np.clip(result, env.action_space.low, env.action_space.high)
return result
def _prepare(self) -> None:
def _rollout(
policy: nn.Module,
update_stats: bool = True,
visualize: bool = False,
decrease_rewards_by: Optional[float] = None,
) -> dict:
"""Peform a rollout of a network"""
if decrease_rewards_by is None:
decrease_rewards_by = self._decrease_rewards_by
decrease_rewards_by = float(decrease_rewards_by)
env = self._get_env()
observation = self._normalize_observation(reset_env(env), update_stats=update_stats)
if visualize:
t = 0
cumulative_reward = 0.0
while True:
observation, raw_reward, done, info = take_step_in_env(env, self._use_policy(observation, policy))
reward = raw_reward - decrease_rewards_by
t += 1
if update_stats:
self._interaction_count += 1
if visualize:
observation = self._normalize_observation(observation, update_stats=update_stats)
cumulative_reward += reward
if done or ((self._episode_length is not None) and (t >= self._episode_length)):
if update_stats:
self._episode_count += 1
final_info = dict(cumulative_reward=cumulative_reward, interaction_count=t)
for k in self._info_keys:
if k not in final_info:
final_info[k] = info[k]
return final_info
def _nonserialized_attribs(self) -> List[str]:
return super()._nonserialized_attribs + ["_env"]
def run(
policy: nn.Module,
update_stats: bool = False,
visualize: bool = False,
num_episodes: Optional[int] = None,
decrease_rewards_by: Optional[float] = None,
) -> dict:
"""Evaluate the policy parameters on the gym environment."""
if num_episodes is None:
num_episodes = self._num_episodes
episode_results = [
for _ in range(num_episodes)
results = _accumulate_all_across_dicts(episode_results, self._info_keys)
return results
def visualize(
policy: nn.Module,
update_stats: bool = False,
num_episodes: Optional[int] = 1,
decrease_rewards_by: Optional[float] = None,
) -> dict:
def _ensure_obsnorm(self):
if not self.observation_normalization:
raise ValueError("This feature can only be used when observation_normalization=True.")
def get_observation_stats(self) -> RunningStat:
"""Get the observation stats"""
return self._obs_stats
def _make_sync_data_for_actors(self) -> Any:
if self.observation_normalization:
return dict(obs_stats=self.get_observation_stats())
return None
def set_observation_stats(self, rs: RunningStat):
"""Set the observation stats"""
def _use_sync_data_from_main(self, received: dict):
for k, v in received.items():
if k == "obs_stats":
def pop_observation_stats(self) -> RunningStat:
"""Get and clear the collected observation stats"""
result = self._collected_stats
self._collected_stats = RunningStat()
return result
def _make_sync_data_for_main(self) -> Any:
result = dict(episode_count=self.episode_count, interaction_count=self.interaction_count)
if self.observation_normalization:
result["obs_stats_delta"] = self.pop_observation_stats()
return result
def update_observation_stats(self, rs: RunningStat):
"""Update the observation stats via another RunningStat instance"""
def _use_sync_data_from_actors(self, received: list):
total_episode_count = 0
total_interaction_count = 0
for data in received:
data: dict
total_episode_count += data["episode_count"]
total_interaction_count += data["interaction_count"]
if self.observation_normalization:
def _make_pickle_data_for_main(self) -> dict:
# For when the main Problem object (the non-remote one) gets pickled,
# this function returns the counters of this remote Problem instance,
# to be sent to the main one.
return dict(interaction_count=self.interaction_count, episode_count=self.episode_count)
def _use_pickle_data_from_main(self, state: dict):
# For when a newly unpickled Problem object gets (re)parallelized,
# this function restores the inner states specific to this remote
# worker. In the case of GymNE, those inner states are episode
# and interaction counters.
for k, v in state.items():
if k == "episode_count":
elif k == "interaction_count":
raise ValueError(f"When restoring the inner state of a remote worker, unrecognized state key: {k}")
def _extra_status(self, batch: SolutionBatch):
return dict(total_interaction_count=self.interaction_count, total_episode_count=self.episode_count)
def observation_normalization(self) -> bool:
Get whether or not observation normalization is enabled.
return self._observation_normalization
def set_episode_count(self, n: int):
Set the episode count manually.
self._episode_count = int(n)
def set_interaction_count(self, n: int):
Set the interaction count manually.
self._interaction_count = int(n)
def interaction_count(self) -> int:
Get the total number of simulator interactions made.
return self._interaction_count
def episode_count(self) -> int:
Get the total number of episodes completed.
return self._episode_count
def _get_local_episode_count(self) -> int:
return self.episode_count
def _get_local_interaction_count(self) -> int:
return self.interaction_count
def _evaluate_network(self, policy: nn.Module) -> Union[float, torch.Tensor]:
result =
return result["cumulative_reward"]
def to_policy(self, x: Iterable, *, trainable_stats: bool = False, clip_actions: bool = True) -> nn.Module:
Convert the given parameter vector to a policy as a PyTorch module.
If the problem is configured to have observation normalization,
the PyTorch module also contains an additional normalization layer.
x: An sequence of real numbers, containing the parameters
of a policy. Can be a PyTorch tensor, a numpy array,
or a SolutionVector.
trainable_stats: Whether or not the observation stats within
the observation normalization layer are to be stored as
trainable parameters.
clip_actions: Whether or not to add an action clipping layer so
that the generated actions will always be within an
acceptable range for the environment.
The policy expressed by the parameters.
policy = [self.make_net(x)]
if self.observation_normalization:
policy.insert(0, ObsNormLayer(self._obs_stats, trainable_stats=trainable_stats))
if clip_actions and isinstance(self._get_env().action_space, gym.spaces.Box):
if len(policy) == 1:
return policy[0]
return nn.Sequential(*policy)
def get_env(self) -> gym.Env:
Get the gym environment stored by this GymNE instance
return self._get_env()
episode_count: int
Get the total number of episodes completed.
interaction_count: int
Get the total number of simulator interactions made.
observation_normalization: bool
Get whether or not observation normalization is enabled.
# Store various environment information
self._env_name = env_name
self._env_config = {} if env_config is None else deepcopy(dict(env_config))
self._decrease_rewards_by = 0.0 if decrease_rewards_by is None else float(decrease_rewards_by)
self._observation_normalization = bool(observation_normalization)
self._num_episodes = int(num_episodes)
self._episode_length = None if episode_length is None else int(episode_length)
self._info_keys = dict(cumulative_reward="avg", interaction_count="sum")
self._env: Optional[gym.Env] = None
self._obs_stats: Optional[RunningStat] = None
self._collected_stats: Optional[RunningStat] = None
# Create a temporary environment to read its dimensions
tmp_env = gym.make(self._env_name, **(self._env_config))
# Store the temporary environment's dimensions
self._obs_length = len(tmp_env.observation_space.low)
if isinstance(tmp_env.action_space, gym.spaces.Discrete):
self._act_length = tmp_env.action_space.n
self._act_length = len(tmp_env.action_space.low)
self._obs_shape = tmp_env.observation_space.low.shape
# Validate the space types of the environment
if self._observation_normalization:
self._obs_stats = RunningStat()
self._collected_stats = RunningStat()
self._obs_stats = None
self._collected_stats = None
self._interaction_count: int = 0
self._episode_count: int = 0
objective_sense="max", # RL is maximization
network=network, # Using the policy as the network
run(self, policy, *, update_stats=False, visualize=False, num_episodes=None, decrease_rewards_by=None)
Evaluate the policy parameters on the gym environment.
def run(
policy: nn.Module,
update_stats: bool = False,
visualize: bool = False,
num_episodes: Optional[int] = None,
decrease_rewards_by: Optional[float] = None,
) -> dict:
"""Evaluate the policy parameters on the gym environment."""
if num_episodes is None:
num_episodes = self._num_episodes
episode_results = [
for _ in range(num_episodes)
results = _accumulate_all_across_dicts(episode_results, self._info_keys)
return results
set_episode_count(self, n)
set_interaction_count(self, n)
set_observation_stats(self, rs)
to_policy(self, x, *, trainable_stats=False, clip_actions=True)
Convert the given parameter vector to a policy as a PyTorch module.
If the problem is configured to have observation normalization, the PyTorch module also contains an additional normalization layer.
Name | Type | Description | Default |
x |
Iterable |
An sequence of real numbers, containing the parameters of a policy. Can be a PyTorch tensor, a numpy array, or a SolutionVector. |
required |
trainable_stats |
bool |
Whether or not the observation stats within the observation normalization layer are to be stored as trainable parameters. |
False |
clip_actions |
bool |
Whether or not to add an action clipping layer so that the generated actions will always be within an acceptable range for the environment. |
True |
Type | Description |
Module |
The policy expressed by the parameters. |
Source code in evotorch/neuroevolution/
def to_policy(self, x: Iterable, *, trainable_stats: bool = False, clip_actions: bool = True) -> nn.Module:
Convert the given parameter vector to a policy as a PyTorch module.
If the problem is configured to have observation normalization,
the PyTorch module also contains an additional normalization layer.
x: An sequence of real numbers, containing the parameters
of a policy. Can be a PyTorch tensor, a numpy array,
or a SolutionVector.
trainable_stats: Whether or not the observation stats within
the observation normalization layer are to be stored as
trainable parameters.
clip_actions: Whether or not to add an action clipping layer so
that the generated actions will always be within an
acceptable range for the environment.
The policy expressed by the parameters.
policy = [self.make_net(x)]
if self.observation_normalization:
policy.insert(0, ObsNormLayer(self._obs_stats, trainable_stats=trainable_stats))
if clip_actions and isinstance(self._get_env().action_space, gym.spaces.Box):
if len(policy) == 1:
return policy[0]
return nn.Sequential(*policy)
update_observation_stats(self, rs)
This namespace contains the NeuroevolutionProblem
NEProblem (Problem)
Base class for neuro-evolution problems where the goal is to optimize the parameters of a neural network represented as a PyTorch module.
Any problem inheriting from this class is expected to override the method
_evaluate_network(self, net: torch.nn.Module) -> Union[torch.Tensor, float]
where net
is the neural network to be evaluated, and the return value
is a scalar or a vector (for multi-objective cases) expressing the
fitness value(s).
Alternatively, this class can be directly instantiated in the following way:
def f(module: MyTorchModuleClass) -> Union[float, torch.Tensor, tuple]:
# Evaluate the given PyTorch module here
fitness = ...
return fitness
problem = NEProblem(
"min", MyTorchModuleClass, f,
which specifies that the problem's goal is to minimize the return of the
function f
For multi-objective cases, the fitness returned by f
is expected as a
1-dimensional tensor. For when the problem has additional evaluation data,
a two-element tuple can be returned by f
instead, where the first
element is the fitness value(s) and the second element is a 1-dimensional
tensor storing the additional data.
class NEProblem(Problem):
Base class for neuro-evolution problems where the goal is to optimize the
parameters of a neural network represented as a PyTorch module.
Any problem inheriting from this class is expected to override the method
`_evaluate_network(self, net: torch.nn.Module) -> Union[torch.Tensor, float]`
where `net` is the neural network to be evaluated, and the return value
is a scalar or a vector (for multi-objective cases) expressing the
fitness value(s).
Alternatively, this class can be directly instantiated in the following
def f(module: MyTorchModuleClass) -> Union[float, torch.Tensor, tuple]:
# Evaluate the given PyTorch module here
fitness = ...
return fitness
problem = NEProblem(
"min", MyTorchModuleClass, f,
which specifies that the problem's goal is to minimize the return of the
function `f`.
For multi-objective cases, the fitness returned by `f` is expected as a
1-dimensional tensor. For when the problem has additional evaluation data,
a two-element tuple can be returned by `f` instead, where the first
element is the fitness value(s) and the second element is a 1-dimensional
tensor storing the additional data.
def __init__(
objective_sense: ObjectiveSense,
network: Union[str, nn.Module, Callable[[], nn.Module]],
network_eval_func: Optional[Callable] = None,
network_args: Optional[dict] = None,
initial_bounds: Optional[BoundsPairLike] = (-0.00001, 0.00001),
eval_dtype: Optional[DType] = None,
eval_data_length: int = 0,
seed: Optional[int] = None,
num_actors: Optional[Union[int, str]] = "num_devices",
actor_config: Optional[dict] = None,
num_gpus_per_actor: Optional[Union[int, float, str]] = None,
num_subbatches: Optional[int] = None,
subbatch_size: Optional[int] = None,
device: Optional[Device] = None,
`__init__(...)`: Initialize the NEProblem.
objective_sense: The objective sense, expected as "min" or "max"
for single-objective cases, or as a sequence of strings
(each string being "min" or "max") for multi-objective cases.
network: A network structure string, or a Callable (which can be
a class inheriting from `torch.nn.Module`, or a function
which returns a `torch.nn.Module` instance), or an instance
of `torch.nn.Module`.
The object provided here determines the structure of the
neural network whose parameters will be evolved.
A network structure string is a string which can be processed
by ``.
Please see the documentation of the function
`` to see how such
a neural network structure string looks like.
network_eval_func: Optionally a function (or any Callable object)
which receives a PyTorch module as its argument, and returns
either a fitness, or a two-element tuple containing the fitness
and the additional evaluation data. The fitness can be a scalar
(for single-objective cases) or a 1-dimensional tensor (for
multi-objective cases). The additional evaluation data is
expected as a 1-dimensional tensor.
If this argument is left as None, it will be expected that
the method `_evaluate_network(...)` is overriden by the
inheriting class.
network_args: Optionally a dict-like object, storing keyword
arguments to be passed to the network while instantiating it.
initial_bounds: Specifies an interval from which the values of the
initial neural network parameters will be drawn.
eval_dtype: dtype to be used for fitnesses. If not specified, then
`eval_dtype` will be inferred from the dtype of the parameters
of the neural network.
In more details, if the neural network's parameters have a
float dtype, `eval_dtype` will be a compatible float.
Otherwise, it will be "float32".
eval_data_length: Length of the extra evaluation data.
seed: Random number seed. If left as None, this NEProblem instance
will not have its own random generator, and the global random
generator of PyTorch will be used instead.
num_actors: Number of actors to create for parallelized
evaluation of the solutions.
Certain string values are also accepted.
When given as "max" or as "num_cpus", the number of actors
will be equal to the number of all available CPUs in the ray
When given as "num_gpus", the number of actors will be
equal to the number of all available GPUs in the ray
cluster, and each actor will be assigned a GPU.
When given as "num_devices", the number of actors will be
equal to the minimum among the number of CPUs and the number
of GPUs available in the cluster (or will be equal to the
number of CPUs if there is no GPU), and each actor will be
assigned a GPU (if available).
If `num_actors` is given as "num_gpus" or "num_devices",
the argument `num_gpus_per_actor` must not be used,
and the `actor_config` dictionary must not contain the
key "num_gpus".
If `num_actors` is given as something other than "num_gpus"
or "num_devices", and if you wish to assign GPUs to each
actor, then please see the argument `num_gpus_per_actor`.
actor_config: A dictionary, representing the keyword arguments
to be passed to the options(...) used when creating the
ray actor objects. To be used for explicitly allocating
resources per each actor.
For example, for declaring that each actor is to use a GPU,
one can pass `actor_config=dict(num_gpus=1)`.
Can also be given as None (which is the default),
if no such options are to be passed.
num_gpus_per_actor: Number of GPUs to be allocated by each
remote actor.
The default behavior is to NOT allocate any GPU at all
(which is the default behavior of the ray library as well).
When given as a number `n`, each actor will be given
`n` GPUs (where `n` can be an integer, or can be a `float`
for fractional allocation).
When given as a string "max", then the available GPUs
across the entire ray cluster (or within the local computer
in the simplest cases) will be equally distributed among
the actors.
When given as a string "all", then each actor will have
access to all the GPUs (this will be achieved by suppressing
the environment variable `CUDA_VISIBLE_DEVICES` for each
When the problem is not distributed (i.e. when there are
no actors), this argument is expected to be left as None.
num_subbatches: If `num_subbatches` is None (assuming that
`subbatch_size` is also None), then, when evaluating a
population, the population will be split into n pieces, `n`
being the number of actors, and each actor will evaluate
its assigned piece. If `num_subbatches` is an integer `m`,
then the population will be split into `m` pieces,
and actors will continually accept the next unevaluated
piece as they finish their current tasks.
The arguments `num_subbatches` and `subbatch_size` cannot
be given values other than None at the same time.
While using a distributed algorithm, this argument determines
how many sub-batches will be generated, and therefore,
how many gradients will be computed by the remote actors.
subbatch_size: If `subbatch_size` is None (assuming that
`num_subbatches` is also None), then, when evaluating a
population, the population will be split into `n` pieces, `n`
being the number of actors, and each actor will evaluate its
assigned piece. If `subbatch_size` is an integer `m`,
then the population will be split into pieces of size `m`,
and actors will continually accept the next unevaluated
piece as they finish their current tasks.
When there can be significant difference across the solutions
in terms of computational requirements, specifying a
`subbatch_size` can be beneficial, because, while one
actor is busy with a subbatch containing computationally
challenging solutions, other actors can accept more
tasks and save time.
The arguments `num_subbatches` and `subbatch_size` cannot
be given values other than None at the same time.
While using a distributed algorithm, this argument determines
the size of a sub-batch (or sub-population) sampled by a
remote actor for computing a gradient.
In distributed mode, it is expected that the population size
is divisible by `subbatch_size`.
device: Default device in which a new population will be generated
and the neural networks will operate.
If not specified, "cpu" will be used.
# Set the main device of the problem
# Although the operation of setting the main device is done by the main Problem class,
# here we need this at an earlier stage.
if device is None:
device = "cpu"
self._device = torch.device(device)
# Set the network
self._original_network = network
self._network_args = {} if network_args is None else deepcopy(network_args)
if isinstance(self._original_network, nn.Module):
self._original_network = self._original_network.cpu()
# Store the function that will evaluate the network, if available
self._network_eval_func: Optional[Callable] = network_eval_func
self.instantiated_network: nn.Module = None
# Create temporary network
temp_network = self._instantiate_net(self._original_network, device="cpu")
bounds=None, # Neuroevolution is an unbounded problem
solution_length=count_parameters(temp_network), # The solution length is inherited from the network passed
dtype=next(temp_network.parameters()).dtype, # The datatype is inherited from the network passed
def network_device(self) -> Device:
"""The device on which the problem should place data e.g. the network"""
cpu_device = torch.device("cpu")
if self.is_main:
# This is the case where this is the main process (not a remote actor)
if self.device == cpu_device:
# If the main device of the problem is "cpu", then we assume that the network is going to be on the cpu as well
return cpu_device
# If the main device of the problem is some other device, then it is that device into which the network will be put
return self.device
# If this is a remote actor, then the network will be put into the auxiliary device allocated for that actor
return self.aux_device
def _network_constants(self) -> dict:
"""Named constants which can be passed to the network instantiation e.g. input/output dimension. To be overridden by the user for custom fixed constants for a problem"""
return {}
def network_constants(self) -> dict:
"""Named constants which can be passed to the network instantiation e.g. input/output dimension"""
constants = {}
return constants
def _nonserialized_attribs(self) -> List[str]:
return ["instantiated_network"]
def _instantiate_net(self, network: Union[str, nn.Module, dict], device: Optional[Device] = None) -> nn.Module:
"""Instantiate the network on the target device, to be overridden by the user for custom behaviour
instantiated_network (nn.Module): The network instantiated on the target device
# Branching point determines instantiation of network
if isinstance(network, str):
# Passed argument was a string representation of a torch module
instantiated_network = str_to_net(network, **self.network_constants())
elif isinstance(network, nn.Module):
# Passed argument was directly a torch module
instantiated_network = network
# Passed argument was callable yielding network
instantiated_network = network(**self.network_constants())
# Map to device
device = self.network_device if device is None else device
instantiated_network =
return instantiated_network
def _prepare(self) -> None:
"""Instantiate the network on the target device, if not already done"""
self.instantiated_network = self._instantiate_net(self._original_network)
# Clear reference to original network
self._original_network = None
def make_net(self, parameters: Iterable) -> nn.Module:
Make a new network filled with the provided parameters.
parameters: Parameters to be used as weights within the network.
Can be a Solution, or any 1-dimensional Iterable that can be
converted to a PyTorch tensor.
A new network, as a `torch.Module` instance.
if isinstance(parameters, Solution):
parameters = parameters.access_values(keep_evals=True)
parameters = self.as_tensor(parameters)
with torch.no_grad():
net = deepcopy(self.parameterize_net(parameters))
return net
def parameterize_net(self, parameters: torch.Tensor) -> nn.Module:
"""Parameterize the network with a given set of parameters.
parameters (torch.Tensor): The parameters with which to instantiate the network
instantiated_network (nn.Module): The network instantiated with the parameters
# Check if network exists
if self.instantiated_network is None:
self.instantiated_network = self._instantiate_net(self._original_network)
network = self.instantiated_network
# Move the parameters if needed
if parameters.device != self.network_device:
parameters =
# Fill the network with the parameters
fill_parameters(network, parameters)
# Return the network
return network
def _grad_device(self) -> Device:
Get the device in which new solutions will be made in distributed mode.
In more details, in distributed mode, each actor creates its own
sub-populations, evaluates them, and computes its own gradient
(all such actor gradients eventually being collected by the
distribution-based search algorithm in the main process).
For some problem types, it can make sense for the remote actors to
create their temporary sub-populations on another device
(e.g. on the GPU that is allocated specifically for them).
For such situations, one is encouraged to override this property
and make it return whatever device is to be used.
In the case of NEProblem, this property returns whatever device
is specified by the property `network_device`.
return self.network_device
def _evaluate_network(self, network: nn.Module) -> Union[float, torch.Tensor, tuple]:
Evaluate a network and return the evaluation result(s).
In the case where the `__init__` of `NEProblem` was not given
a network evaluator function (via the argument `network_eval_func`),
it will be expected that the inheriting class overrides this
method and defines how a network should be evaluated.
network (nn.Module): The network to evaluate
fitness: The networks' fitness value(s), as a scalar for
single-objective cases, or as a 1-dimensional tensor
for multi-objective cases. The returned value can also
be a two-element tuple where the first element is the
fitness (as a scalar or as a vector) and the second
element is a 1-dimensional vector storing the extra
evaluation data.
raise NotImplementedError
def _evaluate(self, solution: Solution):
Evaluate a single solution.
This is achieved by parameterising the problem's attribute
named `instantiated_network`, and then evaluating the network
with the method `_evaluate_network(...)`.
solution (Solution): The solution to evaluate.
parameters = solution.values
if self._network_eval_func is None:
evaluator = self._evaluate_network
evaluator = self._network_eval_func
fitnesses = evaluator(self.parameterize_net(parameters))
if isinstance(fitnesses, tuple):
network_device: Union[str, torch.device]
The device on which the problem should place data e.g. the network
# Set the main device of the problem
# Although the operation of setting the main device is done by the main Problem class,
# here we need this at an earlier stage.
if device is None:
device = "cpu"
self._device = torch.device(device)
# Set the network
self._original_network = network
self._network_args = {} if network_args is None else deepcopy(network_args)
if isinstance(self._original_network, nn.Module):
self._original_network = self._original_network.cpu()
# Store the function that will evaluate the network, if available
self._network_eval_func: Optional[Callable] = network_eval_func
self.instantiated_network: nn.Module = None
# Create temporary network
temp_network = self._instantiate_net(self._original_network, device="cpu")
bounds=None, # Neuroevolution is an unbounded problem
solution_length=count_parameters(temp_network), # The solution length is inherited from the network passed
dtype=next(temp_network.parameters()).dtype, # The datatype is inherited from the network passed
make_net(self, parameters)
Make a new network filled with the provided parameters.
Name | Type | Description | Default |
parameters |
Iterable |
Parameters to be used as weights within the network. Can be a Solution, or any 1-dimensional Iterable that can be converted to a PyTorch tensor. |
required |
Type | Description |
Module |
A new network, as a |
def make_net(self, parameters: Iterable) -> nn.Module:
Make a new network filled with the provided parameters.
parameters: Parameters to be used as weights within the network.
Can be a Solution, or any 1-dimensional Iterable that can be
converted to a PyTorch tensor.
A new network, as a `torch.Module` instance.
if isinstance(parameters, Solution):
parameters = parameters.access_values(keep_evals=True)
parameters = self.as_tensor(parameters)
with torch.no_grad():
net = deepcopy(self.parameterize_net(parameters))
return net
Named constants which can be passed to the network instantiation e.g. input/output dimension
parameterize_net(self, parameters)
Parameterize the network with a given set of parameters.
Name | Type | Description | Default |
parameters |
torch.Tensor |
The parameters with which to instantiate the network |
required |
Type | Description |
instantiated_network (nn.Module) |
The network instantiated with the parameters |
def parameterize_net(self, parameters: torch.Tensor) -> nn.Module:
"""Parameterize the network with a given set of parameters.
parameters (torch.Tensor): The parameters with which to instantiate the network
instantiated_network (nn.Module): The network instantiated with the parameters
# Check if network exists
if self.instantiated_network is None:
self.instantiated_network = self._instantiate_net(self._original_network)
network = self.instantiated_network
# Move the parameters if needed
if parameters.device != self.network_device:
parameters =
# Fill the network with the parameters
fill_parameters(network, parameters)
# Return the network
return network
Utility classes and functions for neural networks
Various neural network layer types
Apply (Module)
A torch module for applying an arithmetic operator on an input tensor
class Apply(nn.Module):
"""A torch module for applying an arithmetic operator on an input tensor"""
def __init__(self, operator: str, argument: float):
"""`__init__(...)`: Initialize the Apply module.
operator: Must be '+', '-', '*', '/', or '**'.
Indicates which operation will be done
on the input tensor.
argument: Expected as a float, represents
the right-argument of the operation
(the left-argument being the input
self._operator = str(operator)
assert self._operator in ("+", "-", "*", "/", "**")
self._argument = float(argument)
def forward(self, x):
op = self._operator
arg = self._argument
if op == "+":
return x + arg
elif op == "-":
return x - arg
elif op == "*":
return x * arg
elif op == "/":
return x / arg
elif op == "**":
return x**arg
raise ValueError("Unknown operator:" + repr(op))
def extra_repr(self):
return "operator={}, argument={}".format(repr(self._operator), self._argument)
__init__(self, operator, argument)
: Initialize the Apply module.
Name | Type | Description | Default |
operator |
str |
Must be '+', '-', '', '/', or '*'. Indicates which operation will be done on the input tensor. |
required |
argument |
float |
Expected as a float, represents the right-argument of the operation (the left-argument being the input tensor). |
required |
def __init__(self, operator: str, argument: float):
"""`__init__(...)`: Initialize the Apply module.
operator: Must be '+', '-', '*', '/', or '**'.
Indicates which operation will be done
on the input tensor.
argument: Expected as a float, represents
the right-argument of the operation
(the left-argument being the input
self._operator = str(operator)
assert self._operator in ("+", "-", "*", "/", "**")
self._argument = float(argument)
forward(self, x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Bin (Module)
A small torch module for binning the values of tensors.
In more details, considering a lower bound value lb, an upper bound value ub, and an input tensor x, each value within x closer to lb will be converted to lb and each value within x closer to ub will be converted to ub.
class Bin(nn.Module):
"""A small torch module for binning the values of tensors.
In more details, considering a lower bound value lb,
an upper bound value ub, and an input tensor x,
each value within x closer to lb will be converted to lb
and each value within x closer to ub will be converted to ub.
def __init__(self, lb: float, ub: float):
"""`__init__(...)`: Initialize the Clip operator.
lb: Lower bound
ub: Upper bound
self._lb = float(lb)
self._ub = float(ub)
self._interval_size = self._ub - self._lb
self._shrink_amount = self._interval_size / 2.0
self._shift_amount = (self._ub + self._lb) / 2.0
def forward(self, x: torch.Tensor):
x = x - self._shift_amount
x = x / self._shrink_amount
x = torch.sign(x)
x = x * self._shrink_amount
x = x + self._shift_amount
return x
def extra_repr(self):
return "lb={}, ub={}".format(self._lb, self._ub)
__init__(self, lb, ub)
: Initialize the Clip operator.
Name | Type | Description | Default |
lb |
float |
Lower bound |
required |
ub |
float |
Upper bound |
required |
def __init__(self, lb: float, ub: float):
"""`__init__(...)`: Initialize the Clip operator.
lb: Lower bound
ub: Upper bound
self._lb = float(lb)
self._ub = float(ub)
self._interval_size = self._ub - self._lb
self._shrink_amount = self._interval_size / 2.0
self._shift_amount = (self._ub + self._lb) / 2.0
forward(self, x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
A small torch module for clipping the values of tensors
class Clip(nn.Module):
"""A small torch module for clipping the values of tensors"""
def __init__(self, lb: float, ub: float):
"""`__init__(...)`: Initialize the Clip operator.
lb: Lower bound. Values less than this will be clipped.
ub: Upper bound. Values greater than this will be clipped.
self._lb = float(lb)
self._ub = float(ub)
def forward(self, x: torch.Tensor):
return x.clamp(self._lb, self._ub)
def extra_repr(self):
return "lb={}, ub={}".format(self._lb, self._ub)
__init__(self, lb, ub)
: Initialize the Clip operator.
Name | Type | Description | Default |
lb |
float |
Lower bound. Values less than this will be clipped. |
required |
ub |
float |
Upper bound. Values greater than this will be clipped. |
required |
forward(self, x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
FeedForwardNet (Module)
Representation of a feed forward neural network as a torch Module.
An example initialization of a FeedForwardNet is as follows:
net = drt.FeedForwardNet(4, [(8, 'tanh'), (6, 'tanh')])
which means that we would like to have a network which expects an input vector of length 4 and passes its input through 2 tanh-activated hidden layers (with neurons count 8 and 6, respectively). The output of the last hidden layer (of length 6) is the final output vector.
The string representation of the module obtained via the example above is:
(layer_0): Linear(in_features=4, out_features=8, bias=True)
(actfunc_0): Tanh()
(layer_1): Linear(in_features=8, out_features=6, bias=True)
(actfunc_1): Tanh()
class FeedForwardNet(nn.Module):
Representation of a feed forward neural network as a torch Module.
An example initialization of a FeedForwardNet is as follows:
net = drt.FeedForwardNet(4, [(8, 'tanh'), (6, 'tanh')])
which means that we would like to have a network which expects an input
vector of length 4 and passes its input through 2 tanh-activated hidden
layers (with neurons count 8 and 6, respectively).
The output of the last hidden layer (of length 6) is the final
output vector.
The string representation of the module obtained via the example above
(layer_0): Linear(in_features=4, out_features=8, bias=True)
(actfunc_0): Tanh()
(layer_1): Linear(in_features=8, out_features=6, bias=True)
(actfunc_1): Tanh()
LengthActTuple = Tuple[int, Union[str, Callable]]
LengthActBiasTuple = Tuple[int, Union[str, Callable], Union[bool]]
def __init__(self, input_size: int, layers: List[Union[LengthActTuple, LengthActBiasTuple]]):
"""`__init__(...)`: Initialize the FeedForward network.
input_size: Input size of the network, expected as an int.
layers: Expected as a list of tuples,
where each tuple is either of the form
`(layer_size, activation_function)`
or of the form
`(layer_size, activation_function, bias)`
in which
(i) `layer_size` is an int, specifying the number of neurons;
(ii) `activation_function` is None, or a callable object,
or a string containing the name of the activation function
('relu', 'selu', 'elu', 'tanh', 'hardtanh', or 'sigmoid');
(iii) `bias` is a boolean, specifying whether the layer
is to have a bias or not.
When omitted, bias is set to True.
for i, layer in enumerate(layers):
if len(layer) == 2:
size, actfunc = layer
bias = True
elif len(layer) == 3:
size, actfunc, bias = layer
assert False, "A layer tuple of invalid size is encountered"
setattr(self, "layer_" + str(i), nn.Linear(input_size, size, bias=bias))
if isinstance(actfunc, str):
if actfunc == "relu":
actfunc = nn.ReLU()
elif actfunc == "selu":
actfunc = nn.SELU()
elif actfunc == "elu":
actfunc = nn.ELU()
elif actfunc == "tanh":
actfunc = nn.Tanh()
elif actfunc == "hardtanh":
actfunc = nn.Hardtanh()
elif actfunc == "sigmoid":
actfunc = nn.Sigmoid()
elif actfunc == "round":
actfunc = Round()
raise ValueError("Unknown activation function: " + repr(actfunc))
setattr(self, "actfunc_" + str(i), actfunc)
input_size = size
def forward(self, x):
i = 0
while hasattr(self, "layer_" + str(i)):
x = getattr(self, "layer_" + str(i))(x)
f = getattr(self, "actfunc_" + str(i))
if f is not None:
x = f(x)
i += 1
return x
__init__(self, input_size, layers)
: Initialize the FeedForward network.
Name | Type | Description | Default |
input_size |
int |
Input size of the network, expected as an int. |
required |
layers |
List[Union[Tuple[int, Union[str, Callable]], Tuple[int, Union[str, Callable], bool]]] |
Expected as a list of tuples,
where each tuple is either of the form
required |
def __init__(self, input_size: int, layers: List[Union[LengthActTuple, LengthActBiasTuple]]):
"""`__init__(...)`: Initialize the FeedForward network.
input_size: Input size of the network, expected as an int.
layers: Expected as a list of tuples,
where each tuple is either of the form
`(layer_size, activation_function)`
or of the form
`(layer_size, activation_function, bias)`
in which
(i) `layer_size` is an int, specifying the number of neurons;
(ii) `activation_function` is None, or a callable object,
or a string containing the name of the activation function
('relu', 'selu', 'elu', 'tanh', 'hardtanh', or 'sigmoid');
(iii) `bias` is a boolean, specifying whether the layer
is to have a bias or not.
When omitted, bias is set to True.
for i, layer in enumerate(layers):
if len(layer) == 2:
size, actfunc = layer
bias = True
elif len(layer) == 3:
size, actfunc, bias = layer
assert False, "A layer tuple of invalid size is encountered"
setattr(self, "layer_" + str(i), nn.Linear(input_size, size, bias=bias))
if isinstance(actfunc, str):
if actfunc == "relu":
actfunc = nn.ReLU()
elif actfunc == "selu":
actfunc = nn.SELU()
elif actfunc == "elu":
actfunc = nn.ELU()
elif actfunc == "tanh":
actfunc = nn.Tanh()
elif actfunc == "hardtanh":
actfunc = nn.Hardtanh()
elif actfunc == "sigmoid":
actfunc = nn.Sigmoid()
elif actfunc == "round":
actfunc = Round()
raise ValueError("Unknown activation function: " + repr(actfunc))
setattr(self, "actfunc_" + str(i), actfunc)
input_size = size
LSTMNet (StatefulModule)
Representation of an LSTM layer.
Differently from torch.nn.LSTM, the forward pass function of this class does NOT expect the hidden state, nor does it return the resulting hidden state of the pass. Instead, the hidden states are stored within the module itself.
The forward pass function can take a 1-dimensional tensor of length
input_size, or it can take a 2-dimensional tensor of size
(batch_size, input_size)
Because the instances of this class are stateful, remember to reset() the internal state when needed.
class LSTMNet(StatefulModule):
"""Representation of an LSTM layer.
Differently from torch.nn.LSTM, the forward pass function of this class
does NOT expect the hidden state, nor does it return
the resulting hidden state of the pass.
Instead, the hidden states are stored within the module itself.
The forward pass function can take a 1-dimensional tensor of length
input_size, or it can take a 2-dimensional tensor of size
`(batch_size, input_size)`.
Because the instances of this class are stateful,
remember to reset() the internal state when needed.
def __init__(self, **kwargs):
`__init__(...)`: Initialize the LSTM net.
input_size: The input size, expected as an int.
hidden_size: Number of neurons, expected as an int.
num_layers: Number of layers of the recurrent net.
StatefulModule.__init__(self, nn.LSTM, **kwargs)
__init__(self, **kwargs)
: Initialize the LSTM net.
Name | Type | Description | Default |
input_size |
The input size, expected as an int. |
required | |
hidden_size |
Number of neurons, expected as an int. |
required | |
num_layers |
Number of layers of the recurrent net. |
required |
LocomotorNet (Module)
This is a control network which consists of two components: one linear, and one non-linear. The non-linear component is an input-independent set of sinusoidals waves whose amplitudes, frequencies and phases are trainable. Upon execution of a forward pass, the output of the non-linear component is the sum of all these sinusoidal waves. The linear component is a linear layer (optionally with bias) whose weights (and biases) are trainable. The final output of the LocomotorNet at the end of a forward pass is the sum of the linear and the non-linear components.
Note that this is a stateful network, where the only state
is the timestep t, which starts from 0 and gets incremented by 1
at the end of each forward pass. The reset()
method resets
t back to 0.
Mario Srouji, Jian Zhang, Ruslan Salakhutdinov (2018). Structured Control Nets for Deep Reinforcement Learning.
class LocomotorNet(nn.Module):
"""LocomotorNet: A locomotion-specific structured control net.
This is a control network which consists of two components:
one linear, and one non-linear. The non-linear component
is an input-independent set of sinusoidals waves whose
amplitudes, frequencies and phases are trainable.
Upon execution of a forward pass, the output of the non-linear
component is the sum of all these sinusoidal waves.
The linear component is a linear layer (optionally with bias)
whose weights (and biases) are trainable.
The final output of the LocomotorNet at the end of a forward pass
is the sum of the linear and the non-linear components.
Note that this is a stateful network, where the only state
is the timestep t, which starts from 0 and gets incremented by 1
at the end of each forward pass. The `reset()` method resets
t back to 0.
Mario Srouji, Jian Zhang, Ruslan Salakhutdinov (2018).
Structured Control Nets for Deep Reinforcement Learning.
def __init__(self, *, in_features: int, out_features: int, bias: bool = True, num_sinusoids=16):
"""`__init__(...)`: Initialize the LocomotorNet.
in_features: Length of the input vector
out_features: Length of the output vector
bias: Whether or not the linear component is to have a bias
num_sinusoids: Number of sinusoidal waves
self._in_features = in_features
self._out_features = out_features
self._bias = bias
self._num_sinusoids = num_sinusoids
self._linear_component = nn.Linear(
in_features=self._in_features, out_features=self._out_features, bias=self._bias
self._amplitudes = nn.ParameterList()
self._frequencies = nn.ParameterList()
self._phases = nn.ParameterList()
for _ in range(self._num_sinusoids):
for paramlist in (self._amplitudes, self._frequencies, self._phases):
paramlist.append(nn.Parameter(torch.randn(self._out_features, dtype=torch.float32)))
def reset(self):
"""Set the timestep t to 0"""
self._t = 0
def t(self) -> int:
"""The current timestep t"""
return self._t
def in_features(self) -> int:
"""Get the length of the input vector"""
return self._in_features
def out_features(self) -> int:
"""Get the length of the output vector"""
return self._out_features
def num_sinusoids(self) -> int:
"""Get the number of sinusoidal waves of the non-linear component"""
return self._num_sinusoids
def bias(self) -> bool:
"""Get whether or not the linear component has bias"""
return self._bias
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Execute a forward pass"""
u_linear = self._linear_component(x)
t = self._t
u_nonlinear = torch.zeros(self._out_features)
for i in range(self._num_sinusoids):
A = self._amplitudes[i]
w = self._frequencies[i]
phi = self._phases[i]
u_nonlinear = u_nonlinear + (A * torch.sin(w * t + phi))
self._t += 1
return u_linear + u_nonlinear
__init__(self, *, in_features, out_features, bias=True, num_sinusoids=16)
: Initialize the LocomotorNet.
forward(self, x)
Execute a forward pass
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Execute a forward pass"""
u_linear = self._linear_component(x)
t = self._t
u_nonlinear = torch.zeros(self._out_features)
for i in range(self._num_sinusoids):
A = self._amplitudes[i]
w = self._frequencies[i]
phi = self._phases[i]
u_nonlinear = u_nonlinear + (A * torch.sin(w * t + phi))
self._t += 1
return u_linear + u_nonlinear
RecurrentNet (StatefulModule)
Representation of a fully connected recurrent net as a torch Module.
Differently from torch.nn.RNN, the forward pass function of this class does NOT expect the hidden state, nor does it return the resulting hidden state of the pass. Instead, the hidden states are stored within the module itself.
The forward pass function can take a 1-dimensional tensor of length input_size, or it can take a 2-dimensional tensor of size (batch_size, input_size).
Because the instances of this class are stateful, remember to reset() the internal state when needed.
Source code in evotorch/neuroevolution/net/
class RecurrentNet(StatefulModule):
"""Representation of a fully connected recurrent net as a torch Module.
Differently from torch.nn.RNN, the forward pass function of this class
does NOT expect the hidden state, nor does it return
the resulting hidden state of the pass.
Instead, the hidden states are stored within the module itself.
The forward pass function can take a 1-dimensional tensor of length
input_size, or it can take a 2-dimensional tensor of size
(batch_size, input_size).
Because the instances of this class are stateful,
remember to reset() the internal state when needed.
def __init__(self, **kwargs):
`__init__(...)`: Initialize the recurrent net.
input_size: The input size, expected as an int.
hidden_size: Number of neurons, expected as an int.
nonlinearity: The activation function,
expected as 'tanh' or 'relu'.
num_layers: Number of layers of the recurrent net.
StatefulModule.__init__(self, nn.RNN, **kwargs)
Round (Module)
A small torch module for rounding the values of an input tensor
Source code in evotorch/neuroevolution/net/
class Round(nn.Module):
"""A small torch module for rounding the values of an input tensor"""
def __init__(self, ndigits: int = 0):
self._ndigits = int(ndigits)
self._q = 10.0**self._ndigits
def forward(self, x):
x = x * self._q
x = torch.round(x)
x = x / self._q
return x
def extra_repr(self):
return "ndigits=" + str(self._ndigits)
forward(self, x)
Slice (Module)
A small torch module for getting the slice of an input tensor
Source code in evotorch/neuroevolution/net/
class Slice(nn.Module):
"""A small torch module for getting the slice of an input tensor"""
def __init__(self, from_index: int, to_index: int):
"""`__init__(...)`: Initialize the Slice operator.
from_index: The index from which the slice begins.
to_index: The exclusive index at which the slice ends.
self._from_index = from_index
self._to_index = to_index
def forward(self, x):
return x[self._from_index : self._to_index]
def extra_repr(self):
return "from_index={}, to_index={}".format(self._from_index, self._to_index)
forward(self, x)
StatefulModule (Module)
Base class for stateful modules. Not to be instantiated directly.
Source code in evotorch/neuroevolution/net/
class StatefulModule(nn.Module):
"""Base class for stateful modules.
Not to be instantiated directly.
def __init__(self, module_class, **kwargs):
assert "batch_first" not in kwargs, "The `batch_first` option is not supported"
self._layer = module_class(**kwargs)
def state(self):
"""Get the tensor of the internal state.
If the recurrent network is just initialized or reset,
then there is no state, so, a None is given.
Not having a state means that an initial internal state tensor of
compatible size with the input will be created at the
first usage of this network.
Each element of this initial internal state tensor is 0.
return self._state
def reset(self):
"""Reset the internal state"""
self._state = None
def forward(self, x):
if len(x.shape) == 1:
input_size = x.shape[0]
x = x.view(1, 1, input_size)
batch_size = 1
orgdim = 1
elif len(x.shape) == 2:
batch_size, input_size = x.shape
x = x.view(1, batch_size, input_size)
orgdim = 2
assert False, (
"expected a tensor with 1 or 2 dimensions, " + "but received a tensor of shape " + str(x.shape)
if self._state is None:
x, self._state = self._layer(x)
x, self._state = self._layer(x, self._state)
if orgdim == 1:
x = x.view(-1)
elif orgdim == 2:
x = x.view(batch_size, -1)
assert False, "unknown value for orgdim"
return x
def batch_first(self):
"""Return True if the module expects the batch dimension first.
Otherwise, return False.
return self._layer.batch_first
StructuredControlNet (Module)
Structured Control Net.
This is a control network consisting of two components: (i) a non-linear component, which is a feed-forward network; and (ii) a linear component, which is a linear layer. Both components take the input vector provided to the structured control network. The final output is the sum of the outputs of both components.
Mario Srouji, Jian Zhang, Ruslan Salakhutdinov (2018). Structured Control Nets for Deep Reinforcement Learning.
Source code in evotorch/neuroevolution/net/
class StructuredControlNet(nn.Module):
"""Structured Control Net.
This is a control network consisting of two components:
(i) a non-linear component, which is a feed-forward network; and
(ii) a linear component, which is a linear layer.
Both components take the input vector provided to the
structured control network.
The final output is the sum of the outputs of both components.
Mario Srouji, Jian Zhang, Ruslan Salakhutdinov (2018).
Structured Control Nets for Deep Reinforcement Learning.
def __init__(
in_features: int,
out_features: int,
num_layers: int,
hidden_size: int,
bias: bool = True,
nonlinearity: Union[str, Callable] = "tanh",
"""`__init__(...)`: Initialize the structured control net.
in_features: Length of the input vector
out_features: Length of the output vector
num_layers: Number of hidden layers for the non-linear component
hidden_size: Number of neurons in a hidden layer of the
non-linear component
bias: Whether or not the linear component is to have bias
nonlinearity: Activation function
self._in_features = in_features
self._out_features = out_features
self._num_layers = num_layers
self._hidden_size = hidden_size
self._bias = bias
self._nonlinearity = nonlinearity
self._linear_component = nn.Linear(
in_features=self._in_features, out_features=self._out_features, bias=self._bias
self._nonlinear_component = FeedForwardNet(
list((self._hidden_size, self._nonlinearity) for _ in range(self._num_layers))
+ [(self._out_features, self._nonlinearity)]
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self._linear_component(x) + self._nonlinear_component(x)
forward(self, x)
Get the number of parameters the network.
fill_parameters(net, vector)
Fill the parameters of a torch module (net) from a vector.
No gradient information is kept.
The vector's length must be exactly the same with the number of parameters of the PyTorch module.
parameter_vector(net, *, device=None)
Get all the parameters of a torch module (net) into a vector
No gradient information is kept.
Utilities for parsing string representations of neural net policies
NetParsingError (Exception)
Representation of a parsing error
Source code in evotorch/neuroevolution/net/
class NetParsingError(Exception):
Representation of a parsing error
def __init__(
message: str,
lineno: Optional[int] = None,
col_offset: Optional[int] = None,
original_error: Optional[Exception] = None,
`__init__(...)`: Initialize the NetParsingError.
message: Error message, as string.
lineno: Erroneous line number in the string representation of the
neural network structure.
col_offset: Erroneous column number in the string representation
of the neural network structure.
original_error: If another error caused this parsing error,
that original error can be attached to this `NetParsingError`
instance via this argument.
self.message = message
self.lineno = lineno
self.col_offset = col_offset
self.original_error = original_error
def _to_string(self) -> str:
parts = []
if self.lineno is not None:
parts.append(" at line(")
parts.append(str(self.lineno - 1))
if self.col_offset is not None:
parts.append(" at column(")
parts.append(str(self.col_offset + 1))
parts.append(": ")
return "".join(parts)
def __str__(self) -> str:
return self._to_string()
def __repr__(self) -> str:
return self._to_string()
__init__(self, message, lineno=None, col_offset=None, original_error=None)
: Initialize the NetParsingError.
str_to_net(s, **constants)
Read a string representation of a neural net structure,
and return a torch.nn.Module
instance out of it.
Let us imagine that one wants to describe the following neural network structure:
from torch import nn
net = nn.Sequential(
nn.Linear(8, 16),
nn.Linear(16, 4, bias=False),
By using str_to_net(...)
one can construct the same
module via:
from import str_to_net
net = str_to_net(
'Linear(8, 16) >> Tanh() >> Linear(16, 4, bias=False) >> ReLU()'
The string can also be multi-line:
One can also define constants for using them in strings:
net = str_to_net(
Linear(input_size, hidden_size)
>> Tanh()
>> Linear(hidden_size, output_size, bias=False)
>> ReLU()
In the neural net structure string, when one refers to a module type,
say, Linear
, first the name Linear
is searched for in the namespace
, and then in the namespace torch.nn
In the case of Linear
, the searched name exists in torch.nn
and therefore, the layer type to be instantiated is accepted as
Instead of Linear
, if one had used the name, say,
, then, the layer type to be instantiated
would be
Notes regarding usage with evotorch.neuroevolution.GymNE
While instantiating a GymNE
, one can specify a neural net
structure string as the policy. Therefore, while filling the policy
string for a GymNE
, all these rules mentioned above apply.
Additionally, while using str_to_net(...)
defines these extra constants:
(length of the observation vector),
(length of the action vector for continuous-action
environments, or number of actions for discrete-action
environments), and obs_shape
(shape of the observation as a tuple,
assuming that the observation space is of type gym.spaces.Box
usable within the string like obs_shape[0]
, obs_shape[1]
, etc.,
or simply obs_shape
to refer to the entire tuple).
Therefore, while using with GymNE
, one can define a
single-hidden-layered policy via this string:
(where one might choose to omit the last Tanh()
as GymNE
will clip the output of the final layer to conform with the
action boundaries of the environment, which one might think as a
type of hard-tanh anyway).
This namespace various RL-specific utilities.
ActClipLayer (Module)
forward(self, x)
ObsNormLayer (Module)
Observation normalization layer for a policy network
Source code in evotorch/neuroevolution/net/
class ObsNormLayer(nn.Module):
"""Observation normalization layer for a policy network"""
def __init__(self, stats: RunningStat, trainable_stats: bool):
"""`__init__(...)`: Initialize the observation normalization layer
stats: The RunninStat object storing the mean and stdev of
all of the observations.
trainable_stats: Whether or not the normalization data
are to be stored as trainable parameters.
mean = torch.tensor(stats.mean, dtype=torch.float32)
stdev = torch.tensor(stats.stdev, dtype=torch.float32)
if trainable_stats:
self.obs_mean = nn.Parameter(mean)
self.obs_stdev = nn.Parameter(stdev)
self.obs_mean = mean
self.obs_stdev = stdev
def forward(self, x):
x = x - self.obs_mean
x = x / self.obs_stdev
return x
__init__(self, stats, trainable_stats)
: Initialize the observation normalization layer
forward(self, x)
Reset a gym environment.
For gym 1.0, the plan is to have a reset(...)
method which returns
a two-element tuple (observation, info)
where info
is an object
providing any additional information regarding the initial state of
the agent. However, the old (pre 1.0) gym API (and some environments
which were written with old gym compatibility in mind) has (or have)
a reset(...)
method which returns a single object that is the
initial observation.
With the assumption that the observation space of the environment
is NOT tuple, this function can work with both pre-1.0 and (hopefully)
after-1.0 versions of gym, and always returns the initial observation.
Please do not use this function on environments whose observation
spaces or tuples, because then this function cannot distinguish between
environments whose reset(...)
methods return a tuple and environments
whose reset(...)
methods return a single observation object but that
observation object is a tuple.
take_step_in_env(env, action)
Take a step in the gym environment. Taking a step means performing the action provided via the arguments.
For gym 1.0, the plan is to have a step(...)
method which returns a
5-elements tuple containing observation
, reward
, terminated
, info
where terminated
is a boolean indicating whether
or not the episode is terminated because of the actions taken within the
environment, and truncated
is a boolean indicating whether or not the
episode is finished because the time limit is reached.
However, the old (pre 1.0) gym API (and some environments which were
written with old gym compatibility in mind) has (or have) a step(...)
method which returns 4 elements: observation
, reward
, done
, info
where done
is a boolean indicating whether or not the episode is
"done", either because of termination or because of truncation.
This function can work with both pre-1.0 and (hopefully) after-1.0
versions of gym, and always returns the 4-element tuple as its result.
Tool for efficiently computing the mean and stdev of arrays. The arrays themselves are not stored separately, instead, they are accumulated.
Source code in evotorch/neuroevolution/net/
class RunningStat:
Tool for efficiently computing the mean and stdev of arrays.
The arrays themselves are not stored separately,
instead, they are accumulated.
def __init__(self):
``__init__(...)``: Initialize the RunningStat.
In the beginning, the number of arrays is 0,
and the sum and the sum of squares are set as NaN.
# self.sum = np.zeros(shape, dtype='float32')
# self.sumsq = np.full(shape, eps, dtype='float32')
# self.count = eps
def reset(self):
Reset the RunningStat to its initial state.
self._sum = float("nan")
self._sumsq = float("nan")
self._count = 0
def _increment(self, s, ssq, c):
# self.sum += s
# self.sumsq += ssq
# self.count += c
if self._count == 0:
self._sum = np.array(s, dtype="float32")
self._sumsq = np.array(ssq, dtype="float32")
self._sum += s
self._sumsq += ssq
self._count += c
def count(self) -> int:
Get the number of arrays accumulated.
return self._count
def sum(self) -> np.ndarray:
Get the sum of all accumulated arrays.
return self._sum
def sum_of_squares(self) -> np.ndarray:
Get the sum of squares of all accumulated arrays.
return self._sumsq
def mean(self) -> np.ndarray:
Get the mean of all accumulated arrays.
return self._sum / self._count
def stdev(self) -> np.ndarray:
Get the standard deviation of all accumulated arrays.
return np.sqrt(np.maximum(self._sumsq / self._count - np.square(self.mean), 1e-2))
# def _set_from_init(self, init_mean, init_std, init_count):
# init_mean = np.asarray(init_mean, dtype='float32')
# init_std = np.asarray(init_std, dtype='float32')
# self._sum = init_mean * init_count
# self._sumsq = (np.square(init_mean) + np.square(init_std)) * init_count
# self._count = init_count
def update(self, x: Union[np.ndarray, "RunningStat"]):
Accumulate more data into the RunningStat object.
If the argument is an array, that array is added
as one more data element.
If the argument is another RunningStat instance,
all the stats accumulated by that RunningStat object
are added into this RunningStat object.
if isinstance(x, RunningStat):
if x.count > 0:
self._increment(x.sum, x.sum_of_squares, x.count)
self._increment(x, np.square(x), 1)
def normalize(self, x: Union[np.ndarray, list]) -> np.ndarray:
Normalize the array x according to the accumulated stats.
x = np.array(x, dtype="float32")
x -= self.mean
x /= self.stdev
return x
def __copy__(self):
return deepcopy(self)
def __get_repr(self):
return "<RunningStat, count: " + str(self._count) + ">"
def __str__(self):
return self.__get_repr()
def __repr__(self):
return self.__get_repr()
Normalize the array x according to the accumulated stats.
update(self, x)
Accumulate more data into the RunningStat object. If the argument is an array, that array is added as one more data element. If the argument is another RunningStat instance, all the stats accumulated by that RunningStat object are added into this RunningStat object.
Source code in evotorch/neuroevolution/net/
def update(self, x: Union[np.ndarray, "RunningStat"]):
Accumulate more data into the RunningStat object.
If the argument is an array, that array is added
as one more data element.
If the argument is another RunningStat instance,
all the stats accumulated by that RunningStat object
are added into this RunningStat object.
if isinstance(x, RunningStat):
if x.count > 0:
self._increment(x.sum, x.sum_of_squares, x.count)
self._increment(x, np.square(x), 1)
SupervisedNE (NEProblem)
Representation of a neuro-evolution problem where the goal is to minimize a loss function in a supervised learning setting.
A supervised learning problem can be defined via subclassing this class
and overriding the methods
_loss(y_hat, y)
(which is to define how the loss is computed)
and _make_dataloader()
(which is to define how a new DataLoader is
Alternatively, this class can be directly instantiated as follows:
def my_loss_function(output_of_network, desired_output):
loss = ... # compute the loss here
return loss
problem = SupervisedNE(
Source code in evotorch/neuroevolution/
class SupervisedNE(NEProblem):
Representation of a neuro-evolution problem where the goal is to minimize
a loss function in a supervised learning setting.
A supervised learning problem can be defined via subclassing this class
and overriding the methods
`_loss(y_hat, y)` (which is to define how the loss is computed)
and `_make_dataloader()` (which is to define how a new DataLoader is
Alternatively, this class can be directly instantiated as follows:
def my_loss_function(output_of_network, desired_output):
loss = ... # compute the loss here
return loss
problem = SupervisedNE(
def __init__(
dataset: Dataset,
network: Union[str, nn.Module, Callable[[], nn.Module]],
loss_func: Optional[Callable] = None,
network_args: Optional[dict] = None,
initial_bounds: Optional[BoundsPairLike] = (-0.00001, 0.00001),
minibatch_size: Optional[int] = None,
num_minibatches: Optional[int] = None,
num_actors: Optional[Union[int, str]] = "num_devices",
common_minibatch: bool = True,
num_gpus_per_actor: Optional[Union[int, float, str]] = None,
actor_config: Optional[dict] = None,
num_subbatches: Optional[int] = None,
subbatch_size: Optional[int] = None,
device: Optional[Device] = None,
`__init__(...)`: Initialize the SupervisedNE.
dataset: The Dataset from which the minibatches will be pulled
network: A network structure string, or a Callable (which can be
a class inheriting from `torch.nn.Module`, or a function
which returns a `torch.nn.Module` instance), or an instance
of `torch.nn.Module`.
The object provided here determines the structure of the
neural network whose parameters will be evolved.
A network structure string is a string which can be processed
by ``.
Please see the documentation of the function
`` to see how such
a neural network structure string looks like.
loss_func: Optionally a function (or a Callable object) which
receives `y_hat` (the output generated by the neural network)
and `y` (the desired output), and returns the loss as a
This argument can also be left as None, in which case it will
be expected that the method `_loss(self, y_hat, y)` is
overriden by the inheriting class.
network_args: Optionally a dict-like object, storing keyword
arguments to be passed to the network while instantiating it.
initial_bounds: Specifies an interval from which the values of the
initial neural network parameters will be drawn.
minibatch_size: Optionally an integer, describing the size of a
minibatch when pulling data from the dataset.
Can also be left as None, in which case it will be expected
that the inheriting class overrides the method
`_make_dataloader()` and defines how a new DataLoader is to be
num_minibatches: An integer, specifying over how many minibatches
will a single neural network be evaluated.
If not specified, it will be assumed that the desired number
of minibatches per network evaluation is 1.
num_actors: Number of actors to create for parallelized
evaluation of the solutions.
Certain string values are also accepted.
When given as "max" or as "num_cpus", the number of actors
will be equal to the number of all available CPUs in the ray
When given as "num_gpus", the number of actors will be
equal to the number of all available GPUs in the ray
cluster, and each actor will be assigned a GPU.
When given as "num_devices", the number of actors will be
equal to the minimum among the number of CPUs and the number
of GPUs available in the cluster (or will be equal to the
number of CPUs if there is no GPU), and each actor will be
assigned a GPU (if available).
If `num_actors` is given as "num_gpus" or "num_devices",
the argument `num_gpus_per_actor` must not be used,
and the `actor_config` dictionary must not contain the
key "num_gpus".
If `num_actors` is given as something other than "num_gpus"
or "num_devices", and if you wish to assign GPUs to each
actor, then please see the argument `num_gpus_per_actor`.
common_minibatch: Whether or not the same minibatches will be
used when evaluating the solutions.
actor_config: A dictionary, representing the keyword arguments
to be passed to the options(...) used when creating the
ray actor objects. To be used for explicitly allocating
resources per each actor.
For example, for declaring that each actor is to use a GPU,
one can pass `actor_config=dict(num_gpus=1)`.
Can also be given as None (which is the default),
if no such options are to be passed.
num_gpus_per_actor: Number of GPUs to be allocated by each
remote actor.
The default behavior is to NOT allocate any GPU at all
(which is the default behavior of the ray library as well).
When given as a number `n`, each actor will be given
`n` GPUs (where `n` can be an integer, or can be a `float`
for fractional allocation).
When given as a string "max", then the available GPUs
across the entire ray cluster (or within the local computer
in the simplest cases) will be equally distributed among
the actors.
When given as a string "all", then each actor will have
access to all the GPUs (this will be achieved by suppressing
the environment variable `CUDA_VISIBLE_DEVICES` for each
When the problem is not distributed (i.e. when there are
no actors), this argument is expected to be left as None.
num_subbatches: If `num_subbatches` is None (assuming that
`subbatch_size` is also None), then, when evaluating a
population, the population will be split into n pieces, `n`
being the number of actors, and each actor will evaluate
its assigned piece. If `num_subbatches` is an integer `m`,
then the population will be split into `m` pieces,
and actors will continually accept the next unevaluated
piece as they finish their current tasks.
The arguments `num_subbatches` and `subbatch_size` cannot
be given values other than None at the same time.
While using a distributed algorithm, this argument determines
how many sub-batches will be generated, and therefore,
how many gradients will be computed by the remote actors.
subbatch_size: If `subbatch_size` is None (assuming that
`num_subbatches` is also None), then, when evaluating a
population, the population will be split into `n` pieces, `n`
being the number of actors, and each actor will evaluate its
assigned piece. If `subbatch_size` is an integer `m`,
then the population will be split into pieces of size `m`,
and actors will continually accept the next unevaluated
piece as they finish their current tasks.
When there can be significant difference across the solutions
in terms of computational requirements, specifying a
`subbatch_size` can be beneficial, because, while one
actor is busy with a subbatch containing computationally
challenging solutions, other actors can accept more
tasks and save time.
The arguments `num_subbatches` and `subbatch_size` cannot
be given values other than None at the same time.
While using a distributed algorithm, this argument determines
the size of a sub-batch (or sub-population) sampled by a
remote actor for computing a gradient.
In distributed mode, it is expected that the population size
is divisible by `subbatch_size`.
device: Default device in which a new population will be generated
and the neural networks will operate.
If not specified, "cpu" will be used.
self.dataset = dataset
self.dataloader: DataLoader = None
self._loss_func = loss_func
self._minibatch_size = None if minibatch_size is None else int(minibatch_size)
self._num_minibatches = 1 if num_minibatches is None else int(num_minibatches)
self._common_minibatch = common_minibatch
self._current_minibatches: Optional[list] = None
def _make_dataloader(self) -> DataLoader:
Make a new DataLoader.
This method, in its default state, does not contain an implementation.
In the case where the `__init__` of `SupervisedNE` is not provided
with a minibatch size, it will be expected that this method is
overriden by the inheriting class and that the operation of creating
a new DataLoader is defined here.
The new DataLoader.
return NotImplementedError
def make_dataloader(self) -> DataLoader:
Make a new DataLoader.
If the `__init__` of `SupervisedNE` was provided with a minibatch size
via the argument `minibatch_size`, then a new DataLoder will be made
with that minibatch size.
Otherwise, it will be expected that the method `_make_dataloader(...)`
was overriden to contain details regarding how the DataLoader should be
created, and that method will be executed.
The created DataLoader.
if self._minibatch_size is None:
return self._make_dataloader()
return DataLoader(self.dataset, shuffle=True, batch_size=self._minibatch_size)
def _evaluate_using_minibatch(self, network: nn.Module, batch: Any) -> Union[float, torch.Tensor]:
Pass a minibatch through a network, and compute the loss.
network: The network using which the loss will be computed.
batch: The minibatch that will be used as data.
The loss.
with torch.no_grad():
x, y = batch
yhat = network(x)
return self.loss(yhat, y)
def _loss(self, y_hat: Any, y: Any) -> Union[float, torch.Tensor]:
The loss function.
This method, in its default state, does not contain an implementation.
In the case where `__init__` of `SupervisedNE` class was not given
a loss function via the argument `loss_func`, it will be expected
that this method is overriden by the inheriting class and that the
operation of computing the loss is defined here.
y_hat: The output estimated by the network
y: The desired output
A scalar, representing the loss
raise NotImplementedError
def loss(self, y_hat: Any, y: Any) -> Union[float, torch.Tensor]:
Run the loss function and return the loss.
If the `__init__` of `SupervisedNE` class was given a loss
function via the argument `loss_func`, then that loss function
will be used. Otherwise, it will be expected that the method
`_loss(...)` is overriden with a loss definition, and that method
will be used to compute the loss.
The computed loss will be returned.
y_hat: The output estimated by the network
y: The desired output
A scalar, representing the loss
if self._loss_func is None:
return self._loss(y_hat, y)
return self._loss_func(y_hat, y)
def _prepare(self) -> None:
self.dataloader = self.make_dataloader()
def get_minibatch(self) -> Any:
Get the next minibatch from the DataLoader.
if self.dataloader is None:
batch = next(self.dataloader_iterator)
if batch is None:
self.dataloader_iterator = iter(self.dataloader)
batch = self.get_minibatch()
batch = batch
except Exception:
self.dataloader_iterator = iter(self.dataloader)
batch = self.get_minibatch()
# Move batch to device of network
return [ for var in batch]
def _evaluate_network(self, network: nn.Module) -> torch.Tensor:
loss = 0.0
for batch_idx in range(self._num_minibatches):
if not self._common_minibatch:
self._current_minibatch = self.get_minibatch()
self._current_minibatch = self._current_minibatches[batch_idx]
loss += self._evaluate_using_minibatch(network, self._current_minibatch) / self._num_minibatches
return loss
def _evaluate_batch(self, batch: SolutionBatch):
if self._common_minibatch:
# If using a common data batch, generate them now and use them for the entire batch of solutions
self._current_minibatches = [self.get_minibatch() for _ in range(self._num_minibatches)]
return super()._evaluate_batch(batch)
__init__(self, dataset, network, loss_func=None, *, network_args=None, initial_bounds=(-1e-05, 1e-05), minibatch_size=None, num_minibatches=None, num_actors='num_devices', common_minibatch=True, num_gpus_per_actor=None, actor_config=None, num_subbatches=None, subbatch_size=None, device=None)
: Initialize the SupervisedNE.
Get the next minibatch from the DataLoader.
loss(self, y_hat, y)
Run the loss function and return the loss.
If the __init__
of SupervisedNE
class was given a loss
function via the argument loss_func
, then that loss function
will be used. Otherwise, it will be expected that the method
is overriden with a loss definition, and that method
will be used to compute the loss.
The computed loss will be returned.
make_dataloader(self)
If the __init__
of SupervisedNE
was provided with a minibatch size
via the argument minibatch_size
, then a new DataLoder will be made
with that minibatch size.
Otherwise, it will be expected that the method _make_dataloader(...)
was overriden to contain details regarding how the DataLoader should be
created, and that method will be executed.
