Skip to content

Index

Problem types for neuroevolution

baseneproblem

BaseNEProblem (Problem)

This is the base class for all neuro-evolution problems.

Currently, this class does not offer any additional functionality. Its purpose is to collect all neuro-evolution problems under the same branch of inheritance.

Source code in evotorch/neuroevolution/baseneproblem.py
class BaseNEProblem(Problem):
    """
    This is the base class for all neuro-evolution problems.

    Currently, this class does not offer any additional functionality.
    Its purpose is to collect all neuro-evolution problems under the same
    branch of inheritance.
    """

    pass

gymne

This namespace contains the GymNE class.

GymNE (NEProblem)

Representation of a NEProblem where the goal is to maximize the total reward obtained in a gym environment.

Source code in evotorch/neuroevolution/gymne.py
class GymNE(NEProblem):
    """
    Representation of a NEProblem where the goal is to maximize
    the total reward obtained in a `gym` environment.
    """

    def __init__(
        self,
        env: Optional[Union[str, Callable]] = None,
        network: Optional[Union[str, nn.Module, Callable[[], nn.Module]]] = None,
        *,
        env_name: Optional[Union[str, Callable]] = None,
        network_args: Optional[dict] = None,
        env_config: Optional[Mapping] = None,
        observation_normalization: bool = False,
        num_episodes: int = 1,
        episode_length: Optional[int] = None,
        decrease_rewards_by: Optional[float] = None,
        alive_bonus_schedule: Optional[tuple] = None,
        action_noise_stdev: Optional[float] = None,
        num_actors: Optional[Union[int, str]] = None,
        actor_config: Optional[dict] = None,
        num_subbatches: Optional[int] = None,
        subbatch_size: Optional[int] = None,
        initial_bounds: Optional[BoundsPairLike] = (-0.00001, 0.00001),
    ):
        """
        `__init__(...)`: Initialize the GymNE.

        Args:
            env: The gym environment to solve. Expected as a Callable
                (maybe a function returning a gym.Env, or maybe a gym.Env
                subclass), or as a string referring to a gym environment
                ID (e.g. "Ant-v4", "Humanoid-v4", etc.).
            network: A network structure string, or a Callable (which can be
                a class inheriting from `torch.nn.Module`, or a function
                which returns a `torch.nn.Module` instance), or an instance
                of `torch.nn.Module`.
                The object provided here determines the structure of the
                neural network policy whose parameters will be evolved.
                A network structure string is a string which can be processed
                by `evotorch.neuroevolution.net.str_to_net(...)`.
                Please see the documentation of the function
                `evotorch.neuroevolution.net.str_to_net(...)` to see how such
                a neural network structure string looks like.
                Note that this network can be a recurrent network.
                When the network's `forward(...)` method can optionally accept
                an additional positional argument for the hidden state of the
                network and returns an additional value for its next state,
                then the policy is treated as a recurrent one.
                When the network is given as a callable object (e.g.
                a subclass of `nn.Module` or a function) and this callable
                object is decorated via `evotorch.decorators.pass_info`,
                the following keyword arguments will be passed:
                (i) `obs_length` (the length of the observation vector),
                (ii) `act_length` (the length of the action vector),
                (iii) `obs_shape` (the shape tuple of the observation space),
                (iv) `act_shape` (the shape tuple of the action space),
                (v) `obs_space` (the Box object specifying the observation
                space, and
                (vi) `act_space` (the Box object specifying the action
                space). Note that `act_space` will always be given as a
                `gym.spaces.Box` instance, even when the actual gym
                environment has a discrete action space. This because `GymNE`
                always expects the neural network to return a tensor of
                floating-point numbers.
            env_name: Deprecated alias for the keyword argument `env`.
                It is recommended to use the argument `env` instead.
            network_args: Optionally a dict-like object, storing keyword
                arguments to be passed to the network while instantiating it.
            env_config: Keyword arguments to pass to `gym.make(...)` while
                creating the `gym` environment.
            observation_normalization: Whether or not to do online observation
                normalization.
            num_episodes: Number of episodes over which a single solution will
                be evaluated.
            episode_length: Maximum amount of simulator interactions allowed
                in a single episode. If left as None, whether or not an episode
                is terminated is determined only by the `gym` environment
                itself.
            decrease_rewards_by: Some gym env.s are defined in such a way that
                the agent gets a constant reward for each timestep
                it survives. This constant reward can also be called
                "survival bonus". Such a rewarding scheme can lead the
                evolution to local optima where the agent does nothing
                but does not die either, just to collect the survival
                bonuses. To prevent this, it can be desired to
                remove the survival bonuses from each reward obtained.
                If this is the case with the problem at hand,
                the user can set the argument `decrease_rewards_by`
                to a positive float number, and that number will
                be subtracted from each reward.
            alive_bonus_schedule: Use this to add a customized amount of
                alive bonus.
                If left as None (which is the default), additional alive
                bonus will not be added.
                If given as a tuple `(t, b)`, an alive bonus `b` will be
                added onto all the rewards beyond the timestep `t`.
                If given as a tuple `(t0, t1, b)`, a partial (linearly
                increasing towards `b`) alive bonus will be added onto
                all the rewards between the timesteps `t0` and `t1`,
                and a full alive bonus (which equals to `b`) will be added
                onto all the rewards beyond the timestep `t1`.
            action_noise_stdev: If given as a real number `s`, then, for
                each generated action, Gaussian noise with standard
                deviation `s` will be sampled, and then this sampled noise
                will be added onto the action.
                If action noise is not desired, then this argument can be
                left as None.
            num_actors: Number of actors to create for parallelized
                evaluation of the solutions.
                One can also set this as "max", which means that
                an actor will be created on each available CPU.
                When the parallelization is enabled each actor will have its
                own instance of the `gym` environment.
            actor_config: A dictionary, representing the keyword arguments
                to be passed to the options(...) used when creating the
                ray actor objects. To be used for explicitly allocating
                resources per each actor.
                For example, for declaring that each actor is to use a GPU,
                one can pass `actor_config=dict(num_gpus=1)`.
                Can also be given as None (which is the default),
                if no such options are to be passed.
            num_subbatches: If `num_subbatches` is None (assuming that
                `subbatch_size` is also None), then, when evaluating a
                population, the population will be split into n pieces, `n`
                being the number of actors, and each actor will evaluate
                its assigned piece. If `num_subbatches` is an integer `m`,
                then the population will be split into `m` pieces,
                and actors will continually accept the next unevaluated
                piece as they finish their current tasks.
                The arguments `num_subbatches` and `subbatch_size` cannot
                be given values other than None at the same time.
            subbatch_size: If `subbatch_size` is None (assuming that
                `num_subbatches` is also None), then, when evaluating a
                population, the population will be split into `n` pieces, `n`
                being the number of actors, and each actor will evaluate its
                assigned piece. If `subbatch_size` is an integer `m`,
                then the population will be split into pieces of size `m`,
                and actors will continually accept the next unevaluated
                piece as they finish their current tasks.
                When there can be significant difference across the solutions
                in terms of computational requirements, specifying a
                `subbatch_size` can be beneficial, because, while one
                actor is busy with a subbatch containing computationally
                challenging solutions, other actors can accept more
                tasks and save time.
                The arguments `num_subbatches` and `subbatch_size` cannot
                be given values other than None at the same time.
            initial_bounds: Specifies an interval from which the values of the
                initial policy parameters will be drawn.
        """
        # Store various environment information
        if (env is not None) and (env_name is None):
            self._env_maker = env
        elif (env is None) and (env_name is not None):
            self._env_maker = env_name
        elif (env is not None) and (env_name is not None):
            raise ValueError(
                f"Received values for both `env` ({repr(env)}) and `env_name` ({repr(env_name)})."
                f" Please specify the environment to solve via only one of these arguments, not both."
            )
        else:
            raise ValueError("Environment name is missing. Please specify it via the argument `env`.")

        # Make sure that the network argument is not missing.
        if network is None:
            raise ValueError(
                "Received None via the argument `network`."
                "Please provide the network as a string, or as a `Callable`, or as a `torch.nn.Module` instance."
            )

        # Store various environment information
        self._env_config = {} if env_config is None else deepcopy(dict(env_config))
        self._decrease_rewards_by = 0.0 if decrease_rewards_by is None else float(decrease_rewards_by)
        self._alive_bonus_schedule = alive_bonus_schedule
        self._action_noise_stdev = None if action_noise_stdev is None else float(action_noise_stdev)
        self._observation_normalization = bool(observation_normalization)
        self._num_episodes = int(num_episodes)
        self._episode_length = None if episode_length is None else int(episode_length)

        self._info_keys = dict(cumulative_reward="avg", interaction_count="sum")

        self._env: Optional[gym.Env] = None

        self._obs_stats: Optional[RunningStat] = None
        self._collected_stats: Optional[RunningStat] = None

        # Create a temporary environment to read its dimensions
        tmp_env = _make_env(self._env_maker, **(self._env_config))

        # Store the temporary environment's dimensions
        self._obs_length = len(tmp_env.observation_space.low)

        if isinstance(tmp_env.action_space, gym.spaces.Discrete):
            self._act_length = tmp_env.action_space.n
            self._box_act_space = gym.spaces.Box(low=float("-inf"), high=float("inf"), shape=(self._act_length,))
        else:
            self._act_length = len(tmp_env.action_space.low)
            self._box_act_space = tmp_env.action_space

        self._act_space = tmp_env.action_space
        self._obs_space = tmp_env.observation_space
        self._obs_shape = tmp_env.observation_space.low.shape

        # Validate the space types of the environment
        ensure_space_types(tmp_env)

        if self._observation_normalization:
            self._obs_stats = RunningStat()
            self._collected_stats = RunningStat()
        else:
            self._obs_stats = None
            self._collected_stats = None
        self._interaction_count: int = 0
        self._episode_count: int = 0

        super().__init__(
            objective_sense="max",  # RL is maximization
            network=network,  # Using the policy as the network
            network_args=network_args,
            initial_bounds=initial_bounds,
            num_actors=num_actors,
            actor_config=actor_config,
            subbatch_size=subbatch_size,
            device="cpu",
        )

        self.after_eval_hook.append(self._extra_status)

    @property
    def _network_constants(self) -> dict:
        return {
            "obs_length": self._obs_length,
            "act_length": self._act_length,
            "obs_space": self._obs_space,
            "act_space": self._box_act_space,
            "obs_shape": self._obs_space.shape,
            "act_shape": self._box_act_space.shape,
        }

    @property
    def _str_network_constants(self) -> dict:
        return {
            "obs_space": self._obs_space.shape,
            "act_space": self._box_act_space.shape,
        }

    def _instantiate_new_env(self, **kwargs) -> gym.Env:
        env_config = {**kwargs, **(self._env_config)}
        env = _make_env(self._env_maker, **env_config)
        if self._alive_bonus_schedule is not None:
            env = AliveBonusScheduleWrapper(env, self._alive_bonus_schedule)
        return env

    def _get_env(self) -> gym.Env:
        if self._env is None:
            self._env = self._instantiate_new_env()
        return self._env

    def _normalize_observation(self, observation: Iterable, *, update_stats: bool = True) -> Iterable:
        observation = np.asarray(observation, dtype="float32")
        if self.observation_normalization:
            if update_stats:
                self._obs_stats.update(observation)
                self._collected_stats.update(observation)
            return self._obs_stats.normalize(observation)
        else:
            return observation

    def _use_policy(self, observation: Iterable, policy: nn.Module) -> Iterable:
        with torch.no_grad():
            result = policy(torch.as_tensor(observation, dtype=torch.float32, device="cpu")).numpy()
        if self._action_noise_stdev is not None:
            result = (
                result
                + self.make_gaussian(len(result), center=0.0, stdev=self._action_noise_stdev, device="cpu").numpy()
            )
        env = self._get_env()
        if isinstance(env.action_space, gym.spaces.Discrete):
            result = np.argmax(result)
        elif isinstance(env.action_space, gym.spaces.Box):
            result = np.clip(result, env.action_space.low, env.action_space.high)
        return result

    def _prepare(self) -> None:
        super()._prepare()
        self._get_env()

    @property
    def network_device(self) -> Device:
        """The device on which the problem should place data e.g. the network
        In the case of GymNE, supported Gym environments return numpy arrays on CPU which are converted to Tensors
        Therefore, it is almost always optimal to place the network on CPU
        """
        return torch.device("cpu")

    def _rollout(
        self,
        *,
        policy: nn.Module,
        update_stats: bool = True,
        visualize: bool = False,
        decrease_rewards_by: Optional[float] = None,
    ) -> dict:
        """Peform a rollout of a network"""
        if decrease_rewards_by is None:
            decrease_rewards_by = self._decrease_rewards_by
        else:
            decrease_rewards_by = float(decrease_rewards_by)

        policy = ensure_stateful(policy)
        policy.reset()

        if visualize:
            env = self._instantiate_new_env(render_mode="human")
        else:
            env = self._get_env()

        observation = self._normalize_observation(reset_env(env), update_stats=update_stats)
        if visualize:
            env.render()
        t = 0

        cumulative_reward = 0.0

        while True:
            observation, raw_reward, done, info = take_step_in_env(env, self._use_policy(observation, policy))
            reward = raw_reward - decrease_rewards_by
            t += 1
            if update_stats:
                self._interaction_count += 1

            if visualize:
                env.render()

            observation = self._normalize_observation(observation, update_stats=update_stats)

            cumulative_reward += reward

            if done or ((self._episode_length is not None) and (t >= self._episode_length)):
                if update_stats:
                    self._episode_count += 1

                final_info = dict(cumulative_reward=cumulative_reward, interaction_count=t)

                for k in self._info_keys:
                    if k not in final_info:
                        final_info[k] = info[k]

                return final_info

    @property
    def _nonserialized_attribs(self) -> List[str]:
        return super()._nonserialized_attribs + ["_env"]

    def run(
        self,
        policy: Union[nn.Module, Iterable],
        *,
        update_stats: bool = False,
        visualize: bool = False,
        num_episodes: Optional[int] = None,
        decrease_rewards_by: Optional[float] = None,
    ) -> dict:
        """
        Evaluate the policy on the gym environment.

        Args:
            policy: The policy to be evaluated. This can be a torch module
                or a sequence of real numbers representing the parameters
                of a policy network.
            update_stats: Whether or not to update the observation
                normalization data while running the policy. If observation
                normalization is not enabled, then this argument will be
                ignored.
            visualize: Whether or not to render the environment while running
                the policy.
            num_episodes: Over how many episodes will the policy be evaluated.
                Expected as None (which is the default), or as an integer.
                If given as None, then the `num_episodes` value that was given
                while initializing this GymNE will be used.
            decrease_rewards_by: How much each reward value should be
                decreased. If left as None, the `decrease_rewards_by` value
                value that was given while initializing this GymNE will be
                used.
        Returns:
            A dictionary containing the score and the timestep count.
        """
        if not isinstance(policy, nn.Module):
            policy = self.make_net(policy)

        if num_episodes is None:
            num_episodes = self._num_episodes

        try:
            policy.eval()

            episode_results = [
                self._rollout(
                    policy=policy,
                    update_stats=update_stats,
                    visualize=visualize,
                    decrease_rewards_by=decrease_rewards_by,
                )
                for _ in range(num_episodes)
            ]

            results = _accumulate_all_across_dicts(episode_results, self._info_keys)
            return results
        finally:
            policy.train()

    def visualize(
        self,
        policy: Union[nn.Module, Iterable],
        *,
        update_stats: bool = False,
        num_episodes: Optional[int] = 1,
        decrease_rewards_by: Optional[float] = None,
    ) -> dict:
        """
        Evaluate the policy and render its actions in the environment.

        Args:
            policy: The policy to be evaluated. This can be a torch module
                or a sequence of real numbers representing the parameters
                of a policy network.
            update_stats: Whether or not to update the observation
                normalization data while running the policy. If observation
                normalization is not enabled, then this argument will be
                ignored.
            num_episodes: Over how many episodes will the policy be evaluated.
                Expected as None (which is the default), or as an integer.
                If given as None, then the `num_episodes` value that was given
                while initializing this GymNE will be used.
            decrease_rewards_by: How much each reward value should be
                decreased. If left as None, the `decrease_rewards_by` value
                value that was given while initializing this GymNE will be
                used.
        Returns:
            A dictionary containing the score and the timestep count.
        """
        return self.run(
            policy=policy,
            update_stats=update_stats,
            visualize=True,
            num_episodes=num_episodes,
            decrease_rewards_by=decrease_rewards_by,
        )

    def _ensure_obsnorm(self):
        if not self.observation_normalization:
            raise ValueError("This feature can only be used when observation_normalization=True.")

    def get_observation_stats(self) -> RunningStat:
        """Get the observation stats"""
        self._ensure_obsnorm()
        return self._obs_stats

    def _make_sync_data_for_actors(self) -> Any:
        if self.observation_normalization:
            return dict(obs_stats=self.get_observation_stats())
        else:
            return None

    def set_observation_stats(self, rs: RunningStat):
        """Set the observation stats"""
        self._ensure_obsnorm()
        self._obs_stats.reset()
        self._obs_stats.update(rs)

    def _use_sync_data_from_main(self, received: dict):
        for k, v in received.items():
            if k == "obs_stats":
                self.set_observation_stats(v)

    def pop_observation_stats(self) -> RunningStat:
        """Get and clear the collected observation stats"""
        self._ensure_obsnorm()
        result = self._collected_stats
        self._collected_stats = RunningStat()
        return result

    def _make_sync_data_for_main(self) -> Any:
        result = dict(episode_count=self.episode_count, interaction_count=self.interaction_count)

        if self.observation_normalization:
            result["obs_stats_delta"] = self.pop_observation_stats()

        return result

    def update_observation_stats(self, rs: RunningStat):
        """Update the observation stats via another RunningStat instance"""
        self._ensure_obsnorm()
        self._obs_stats.update(rs)

    def _use_sync_data_from_actors(self, received: list):
        total_episode_count = 0
        total_interaction_count = 0

        for data in received:
            data: dict
            total_episode_count += data["episode_count"]
            total_interaction_count += data["interaction_count"]
            if self.observation_normalization:
                self.update_observation_stats(data["obs_stats_delta"])

        self.set_episode_count(total_episode_count)
        self.set_interaction_count(total_interaction_count)

    def _make_pickle_data_for_main(self) -> dict:
        # For when the main Problem object (the non-remote one) gets pickled,
        # this function returns the counters of this remote Problem instance,
        # to be sent to the main one.
        return dict(interaction_count=self.interaction_count, episode_count=self.episode_count)

    def _use_pickle_data_from_main(self, state: dict):
        # For when a newly unpickled Problem object gets (re)parallelized,
        # this function restores the inner states specific to this remote
        # worker. In the case of GymNE, those inner states are episode
        # and interaction counters.
        for k, v in state.items():
            if k == "episode_count":
                self.set_episode_count(v)
            elif k == "interaction_count":
                self.set_interaction_count(v)
            else:
                raise ValueError(f"When restoring the inner state of a remote worker, unrecognized state key: {k}")

    def _extra_status(self, batch: SolutionBatch):
        return dict(total_interaction_count=self.interaction_count, total_episode_count=self.episode_count)

    @property
    def observation_normalization(self) -> bool:
        """
        Get whether or not observation normalization is enabled.
        """
        return self._observation_normalization

    def set_episode_count(self, n: int):
        """
        Set the episode count manually.
        """
        self._episode_count = int(n)

    def set_interaction_count(self, n: int):
        """
        Set the interaction count manually.
        """
        self._interaction_count = int(n)

    @property
    def interaction_count(self) -> int:
        """
        Get the total number of simulator interactions made.
        """
        return self._interaction_count

    @property
    def episode_count(self) -> int:
        """
        Get the total number of episodes completed.
        """
        return self._episode_count

    def _get_local_episode_count(self) -> int:
        return self.episode_count

    def _get_local_interaction_count(self) -> int:
        return self.interaction_count

    def _evaluate_network(self, policy: nn.Module) -> Union[float, torch.Tensor]:
        result = self.run(
            policy,
            update_stats=True,
            visualize=False,
            num_episodes=self._num_episodes,
            decrease_rewards_by=self._decrease_rewards_by,
        )
        return result["cumulative_reward"]

    def to_policy(self, x: Iterable, *, clip_actions: bool = True) -> nn.Module:
        """
        Convert the given parameter vector to a policy as a PyTorch module.

        If the problem is configured to have observation normalization,
        the PyTorch module also contains an additional normalization layer.

        Args:
            x: An sequence of real numbers, containing the parameters
                of a policy. Can be a PyTorch tensor, a numpy array,
                or a Solution.
            clip_actions: Whether or not to add an action clipping layer so
                that the generated actions will always be within an
                acceptable range for the environment.
        Returns:
            The policy expressed by the parameters.
        """

        policy = self.make_net(x)

        if self.observation_normalization and (self._obs_stats.count > 0):
            policy = ObsNormWrapperModule(policy, self._obs_stats)

        if clip_actions and isinstance(self._get_env().action_space, gym.spaces.Box):
            policy = ActClipWrapperModule(policy, self._get_env().action_space)

        return policy

    def save_solution(self, solution: Iterable, fname: Union[str, Path]):
        """
        Save the solution into a pickle file.
        Among the saved data within the pickle file are the solution
        (as a PyTorch tensor), the policy (as a `torch.nn.Module` instance),
        and observation stats (if any).

        Args:
            solution: The solution to be saved. This can be a PyTorch tensor,
                a `Solution` instance, or any `Iterable`.
            fname: The file name of the pickle file to be created.
        """

        # Convert the solution to a PyTorch tensor on the cpu.
        if isinstance(solution, torch.Tensor):
            solution = solution.to("cpu")
        elif isinstance(solution, Solution):
            solution = solution.values.clone().to("cpu")
        else:
            solution = torch.as_tensor(solution, dtype=torch.float32, device="cpu")

        if isinstance(solution, ReadOnlyTensor):
            solution = solution.as_subclass(torch.Tensor)

        policy = self.to_policy(solution).to("cpu")

        # Store the solution and the policy.
        result = {
            "solution": solution,
            "policy": policy,
        }

        # If available, store the observation stats.
        if self.observation_normalization and (self._obs_stats is not None):
            result["obs_mean"] = torch.as_tensor(self._obs_stats.mean)
            result["obs_stdev"] = torch.as_tensor(self._obs_stats.stdev)
            result["obs_sum"] = torch.as_tensor(self._obs_stats.sum)
            result["obs_sum_of_squares"] = torch.as_tensor(self._obs_stats.sum_of_squares)

        # Some additional data.
        result["interaction_count"] = self.interaction_count
        result["episode_count"] = self.episode_count
        result["time"] = datetime.now()

        # If the environment is specified via a string ID, then store that ID.
        if isinstance(self._env_maker, str):
            result["env"] = self._env_maker

        # Save the dictionary which stores the data.
        with open(fname, "wb") as f:
            pickle.dump(result, f)

    def get_env(self) -> gym.Env:
        """
        Get the gym environment stored by this GymNE instance
        """
        return self._get_env()

episode_count: int property readonly

Get the total number of episodes completed.

interaction_count: int property readonly

Get the total number of simulator interactions made.

network_device: Union[str, torch.device] property readonly

The device on which the problem should place data e.g. the network In the case of GymNE, supported Gym environments return numpy arrays on CPU which are converted to Tensors Therefore, it is almost always optimal to place the network on CPU

observation_normalization: bool property readonly

Get whether or not observation normalization is enabled.

__init__(self, env=None, network=None, *, env_name=None, network_args=None, env_config=None, observation_normalization=False, num_episodes=1, episode_length=None, decrease_rewards_by=None, alive_bonus_schedule=None, action_noise_stdev=None, num_actors=None, actor_config=None, num_subbatches=None, subbatch_size=None, initial_bounds=(-1e-05, 1e-05)) special

__init__(...): Initialize the GymNE.

Parameters:

Name Type Description Default
env Union[str, Callable]

The gym environment to solve. Expected as a Callable (maybe a function returning a gym.Env, or maybe a gym.Env subclass), or as a string referring to a gym environment ID (e.g. "Ant-v4", "Humanoid-v4", etc.).

None
network Union[str, torch.nn.modules.module.Module, Callable[[], torch.nn.modules.module.Module]]

A network structure string, or a Callable (which can be a class inheriting from torch.nn.Module, or a function which returns a torch.nn.Module instance), or an instance of torch.nn.Module. The object provided here determines the structure of the neural network policy whose parameters will be evolved. A network structure string is a string which can be processed by evotorch.neuroevolution.net.str_to_net(...). Please see the documentation of the function evotorch.neuroevolution.net.str_to_net(...) to see how such a neural network structure string looks like. Note that this network can be a recurrent network. When the network's forward(...) method can optionally accept an additional positional argument for the hidden state of the network and returns an additional value for its next state, then the policy is treated as a recurrent one. When the network is given as a callable object (e.g. a subclass of nn.Module or a function) and this callable object is decorated via evotorch.decorators.pass_info, the following keyword arguments will be passed: (i) obs_length (the length of the observation vector), (ii) act_length (the length of the action vector), (iii) obs_shape (the shape tuple of the observation space), (iv) act_shape (the shape tuple of the action space), (v) obs_space (the Box object specifying the observation space, and (vi) act_space (the Box object specifying the action space). Note that act_space will always be given as a gym.spaces.Box instance, even when the actual gym environment has a discrete action space. This because GymNE always expects the neural network to return a tensor of floating-point numbers.

None
env_name Union[str, Callable]

Deprecated alias for the keyword argument env. It is recommended to use the argument env instead.

None
network_args Optional[dict]

Optionally a dict-like object, storing keyword arguments to be passed to the network while instantiating it.

None
env_config Optional[collections.abc.Mapping]

Keyword arguments to pass to gym.make(...) while creating the gym environment.

None
observation_normalization bool

Whether or not to do online observation normalization.

False
num_episodes int

Number of episodes over which a single solution will be evaluated.

1
episode_length Optional[int]

Maximum amount of simulator interactions allowed in a single episode. If left as None, whether or not an episode is terminated is determined only by the gym environment itself.

None
decrease_rewards_by Optional[float]

Some gym env.s are defined in such a way that the agent gets a constant reward for each timestep it survives. This constant reward can also be called "survival bonus". Such a rewarding scheme can lead the evolution to local optima where the agent does nothing but does not die either, just to collect the survival bonuses. To prevent this, it can be desired to remove the survival bonuses from each reward obtained. If this is the case with the problem at hand, the user can set the argument decrease_rewards_by to a positive float number, and that number will be subtracted from each reward.

None
alive_bonus_schedule Optional[tuple]

Use this to add a customized amount of alive bonus. If left as None (which is the default), additional alive bonus will not be added. If given as a tuple (t, b), an alive bonus b will be added onto all the rewards beyond the timestep t. If given as a tuple (t0, t1, b), a partial (linearly increasing towards b) alive bonus will be added onto all the rewards between the timesteps t0 and t1, and a full alive bonus (which equals to b) will be added onto all the rewards beyond the timestep t1.

None
action_noise_stdev Optional[float]

If given as a real number s, then, for each generated action, Gaussian noise with standard deviation s will be sampled, and then this sampled noise will be added onto the action. If action noise is not desired, then this argument can be left as None.

None
num_actors Union[int, str]

Number of actors to create for parallelized evaluation of the solutions. One can also set this as "max", which means that an actor will be created on each available CPU. When the parallelization is enabled each actor will have its own instance of the gym environment.

None
actor_config Optional[dict]

A dictionary, representing the keyword arguments to be passed to the options(...) used when creating the ray actor objects. To be used for explicitly allocating resources per each actor. For example, for declaring that each actor is to use a GPU, one can pass actor_config=dict(num_gpus=1). Can also be given as None (which is the default), if no such options are to be passed.

None
num_subbatches Optional[int]

If num_subbatches is None (assuming that subbatch_size is also None), then, when evaluating a population, the population will be split into n pieces, n being the number of actors, and each actor will evaluate its assigned piece. If num_subbatches is an integer m, then the population will be split into m pieces, and actors will continually accept the next unevaluated piece as they finish their current tasks. The arguments num_subbatches and subbatch_size cannot be given values other than None at the same time.

None
subbatch_size Optional[int]

If subbatch_size is None (assuming that num_subbatches is also None), then, when evaluating a population, the population will be split into n pieces, n being the number of actors, and each actor will evaluate its assigned piece. If subbatch_size is an integer m, then the population will be split into pieces of size m, and actors will continually accept the next unevaluated piece as they finish their current tasks. When there can be significant difference across the solutions in terms of computational requirements, specifying a subbatch_size can be beneficial, because, while one actor is busy with a subbatch containing computationally challenging solutions, other actors can accept more tasks and save time. The arguments num_subbatches and subbatch_size cannot be given values other than None at the same time.

None
initial_bounds Union[Iterable[Union[float, Iterable[float], torch.Tensor]], evotorch.core.BoundsPair]

Specifies an interval from which the values of the initial policy parameters will be drawn.

(-1e-05, 1e-05)
Source code in evotorch/neuroevolution/gymne.py
def __init__(
    self,
    env: Optional[Union[str, Callable]] = None,
    network: Optional[Union[str, nn.Module, Callable[[], nn.Module]]] = None,
    *,
    env_name: Optional[Union[str, Callable]] = None,
    network_args: Optional[dict] = None,
    env_config: Optional[Mapping] = None,
    observation_normalization: bool = False,
    num_episodes: int = 1,
    episode_length: Optional[int] = None,
    decrease_rewards_by: Optional[float] = None,
    alive_bonus_schedule: Optional[tuple] = None,
    action_noise_stdev: Optional[float] = None,
    num_actors: Optional[Union[int, str]] = None,
    actor_config: Optional[dict] = None,
    num_subbatches: Optional[int] = None,
    subbatch_size: Optional[int] = None,
    initial_bounds: Optional[BoundsPairLike] = (-0.00001, 0.00001),
):
    """
    `__init__(...)`: Initialize the GymNE.

    Args:
        env: The gym environment to solve. Expected as a Callable
            (maybe a function returning a gym.Env, or maybe a gym.Env
            subclass), or as a string referring to a gym environment
            ID (e.g. "Ant-v4", "Humanoid-v4", etc.).
        network: A network structure string, or a Callable (which can be
            a class inheriting from `torch.nn.Module`, or a function
            which returns a `torch.nn.Module` instance), or an instance
            of `torch.nn.Module`.
            The object provided here determines the structure of the
            neural network policy whose parameters will be evolved.
            A network structure string is a string which can be processed
            by `evotorch.neuroevolution.net.str_to_net(...)`.
            Please see the documentation of the function
            `evotorch.neuroevolution.net.str_to_net(...)` to see how such
            a neural network structure string looks like.
            Note that this network can be a recurrent network.
            When the network's `forward(...)` method can optionally accept
            an additional positional argument for the hidden state of the
            network and returns an additional value for its next state,
            then the policy is treated as a recurrent one.
            When the network is given as a callable object (e.g.
            a subclass of `nn.Module` or a function) and this callable
            object is decorated via `evotorch.decorators.pass_info`,
            the following keyword arguments will be passed:
            (i) `obs_length` (the length of the observation vector),
            (ii) `act_length` (the length of the action vector),
            (iii) `obs_shape` (the shape tuple of the observation space),
            (iv) `act_shape` (the shape tuple of the action space),
            (v) `obs_space` (the Box object specifying the observation
            space, and
            (vi) `act_space` (the Box object specifying the action
            space). Note that `act_space` will always be given as a
            `gym.spaces.Box` instance, even when the actual gym
            environment has a discrete action space. This because `GymNE`
            always expects the neural network to return a tensor of
            floating-point numbers.
        env_name: Deprecated alias for the keyword argument `env`.
            It is recommended to use the argument `env` instead.
        network_args: Optionally a dict-like object, storing keyword
            arguments to be passed to the network while instantiating it.
        env_config: Keyword arguments to pass to `gym.make(...)` while
            creating the `gym` environment.
        observation_normalization: Whether or not to do online observation
            normalization.
        num_episodes: Number of episodes over which a single solution will
            be evaluated.
        episode_length: Maximum amount of simulator interactions allowed
            in a single episode. If left as None, whether or not an episode
            is terminated is determined only by the `gym` environment
            itself.
        decrease_rewards_by: Some gym env.s are defined in such a way that
            the agent gets a constant reward for each timestep
            it survives. This constant reward can also be called
            "survival bonus". Such a rewarding scheme can lead the
            evolution to local optima where the agent does nothing
            but does not die either, just to collect the survival
            bonuses. To prevent this, it can be desired to
            remove the survival bonuses from each reward obtained.
            If this is the case with the problem at hand,
            the user can set the argument `decrease_rewards_by`
            to a positive float number, and that number will
            be subtracted from each reward.
        alive_bonus_schedule: Use this to add a customized amount of
            alive bonus.
            If left as None (which is the default), additional alive
            bonus will not be added.
            If given as a tuple `(t, b)`, an alive bonus `b` will be
            added onto all the rewards beyond the timestep `t`.
            If given as a tuple `(t0, t1, b)`, a partial (linearly
            increasing towards `b`) alive bonus will be added onto
            all the rewards between the timesteps `t0` and `t1`,
            and a full alive bonus (which equals to `b`) will be added
            onto all the rewards beyond the timestep `t1`.
        action_noise_stdev: If given as a real number `s`, then, for
            each generated action, Gaussian noise with standard
            deviation `s` will be sampled, and then this sampled noise
            will be added onto the action.
            If action noise is not desired, then this argument can be
            left as None.
        num_actors: Number of actors to create for parallelized
            evaluation of the solutions.
            One can also set this as "max", which means that
            an actor will be created on each available CPU.
            When the parallelization is enabled each actor will have its
            own instance of the `gym` environment.
        actor_config: A dictionary, representing the keyword arguments
            to be passed to the options(...) used when creating the
            ray actor objects. To be used for explicitly allocating
            resources per each actor.
            For example, for declaring that each actor is to use a GPU,
            one can pass `actor_config=dict(num_gpus=1)`.
            Can also be given as None (which is the default),
            if no such options are to be passed.
        num_subbatches: If `num_subbatches` is None (assuming that
            `subbatch_size` is also None), then, when evaluating a
            population, the population will be split into n pieces, `n`
            being the number of actors, and each actor will evaluate
            its assigned piece. If `num_subbatches` is an integer `m`,
            then the population will be split into `m` pieces,
            and actors will continually accept the next unevaluated
            piece as they finish their current tasks.
            The arguments `num_subbatches` and `subbatch_size` cannot
            be given values other than None at the same time.
        subbatch_size: If `subbatch_size` is None (assuming that
            `num_subbatches` is also None), then, when evaluating a
            population, the population will be split into `n` pieces, `n`
            being the number of actors, and each actor will evaluate its
            assigned piece. If `subbatch_size` is an integer `m`,
            then the population will be split into pieces of size `m`,
            and actors will continually accept the next unevaluated
            piece as they finish their current tasks.
            When there can be significant difference across the solutions
            in terms of computational requirements, specifying a
            `subbatch_size` can be beneficial, because, while one
            actor is busy with a subbatch containing computationally
            challenging solutions, other actors can accept more
            tasks and save time.
            The arguments `num_subbatches` and `subbatch_size` cannot
            be given values other than None at the same time.
        initial_bounds: Specifies an interval from which the values of the
            initial policy parameters will be drawn.
    """
    # Store various environment information
    if (env is not None) and (env_name is None):
        self._env_maker = env
    elif (env is None) and (env_name is not None):
        self._env_maker = env_name
    elif (env is not None) and (env_name is not None):
        raise ValueError(
            f"Received values for both `env` ({repr(env)}) and `env_name` ({repr(env_name)})."
            f" Please specify the environment to solve via only one of these arguments, not both."
        )
    else:
        raise ValueError("Environment name is missing. Please specify it via the argument `env`.")

    # Make sure that the network argument is not missing.
    if network is None:
        raise ValueError(
            "Received None via the argument `network`."
            "Please provide the network as a string, or as a `Callable`, or as a `torch.nn.Module` instance."
        )

    # Store various environment information
    self._env_config = {} if env_config is None else deepcopy(dict(env_config))
    self._decrease_rewards_by = 0.0 if decrease_rewards_by is None else float(decrease_rewards_by)
    self._alive_bonus_schedule = alive_bonus_schedule
    self._action_noise_stdev = None if action_noise_stdev is None else float(action_noise_stdev)
    self._observation_normalization = bool(observation_normalization)
    self._num_episodes = int(num_episodes)
    self._episode_length = None if episode_length is None else int(episode_length)

    self._info_keys = dict(cumulative_reward="avg", interaction_count="sum")

    self._env: Optional[gym.Env] = None

    self._obs_stats: Optional[RunningStat] = None
    self._collected_stats: Optional[RunningStat] = None

    # Create a temporary environment to read its dimensions
    tmp_env = _make_env(self._env_maker, **(self._env_config))

    # Store the temporary environment's dimensions
    self._obs_length = len(tmp_env.observation_space.low)

    if isinstance(tmp_env.action_space, gym.spaces.Discrete):
        self._act_length = tmp_env.action_space.n
        self._box_act_space = gym.spaces.Box(low=float("-inf"), high=float("inf"), shape=(self._act_length,))
    else:
        self._act_length = len(tmp_env.action_space.low)
        self._box_act_space = tmp_env.action_space

    self._act_space = tmp_env.action_space
    self._obs_space = tmp_env.observation_space
    self._obs_shape = tmp_env.observation_space.low.shape

    # Validate the space types of the environment
    ensure_space_types(tmp_env)

    if self._observation_normalization:
        self._obs_stats = RunningStat()
        self._collected_stats = RunningStat()
    else:
        self._obs_stats = None
        self._collected_stats = None
    self._interaction_count: int = 0
    self._episode_count: int = 0

    super().__init__(
        objective_sense="max",  # RL is maximization
        network=network,  # Using the policy as the network
        network_args=network_args,
        initial_bounds=initial_bounds,
        num_actors=num_actors,
        actor_config=actor_config,
        subbatch_size=subbatch_size,
        device="cpu",
    )

    self.after_eval_hook.append(self._extra_status)

get_env(self)

Get the gym environment stored by this GymNE instance

Source code in evotorch/neuroevolution/gymne.py
def get_env(self) -> gym.Env:
    """
    Get the gym environment stored by this GymNE instance
    """
    return self._get_env()

get_observation_stats(self)

Get the observation stats

Source code in evotorch/neuroevolution/gymne.py
def get_observation_stats(self) -> RunningStat:
    """Get the observation stats"""
    self._ensure_obsnorm()
    return self._obs_stats

pop_observation_stats(self)

Get and clear the collected observation stats

Source code in evotorch/neuroevolution/gymne.py
def pop_observation_stats(self) -> RunningStat:
    """Get and clear the collected observation stats"""
    self._ensure_obsnorm()
    result = self._collected_stats
    self._collected_stats = RunningStat()
    return result

run(self, policy, *, update_stats=False, visualize=False, num_episodes=None, decrease_rewards_by=None)

Evaluate the policy on the gym environment.

Parameters:

Name Type Description Default
policy Union[torch.nn.modules.module.Module, Iterable]

The policy to be evaluated. This can be a torch module or a sequence of real numbers representing the parameters of a policy network.

required
update_stats bool

Whether or not to update the observation normalization data while running the policy. If observation normalization is not enabled, then this argument will be ignored.

False
visualize bool

Whether or not to render the environment while running the policy.

False
num_episodes Optional[int]

Over how many episodes will the policy be evaluated. Expected as None (which is the default), or as an integer. If given as None, then the num_episodes value that was given while initializing this GymNE will be used.

None
decrease_rewards_by Optional[float]

How much each reward value should be decreased. If left as None, the decrease_rewards_by value value that was given while initializing this GymNE will be used.

None

Returns:

Type Description
dict

A dictionary containing the score and the timestep count.

Source code in evotorch/neuroevolution/gymne.py
def run(
    self,
    policy: Union[nn.Module, Iterable],
    *,
    update_stats: bool = False,
    visualize: bool = False,
    num_episodes: Optional[int] = None,
    decrease_rewards_by: Optional[float] = None,
) -> dict:
    """
    Evaluate the policy on the gym environment.

    Args:
        policy: The policy to be evaluated. This can be a torch module
            or a sequence of real numbers representing the parameters
            of a policy network.
        update_stats: Whether or not to update the observation
            normalization data while running the policy. If observation
            normalization is not enabled, then this argument will be
            ignored.
        visualize: Whether or not to render the environment while running
            the policy.
        num_episodes: Over how many episodes will the policy be evaluated.
            Expected as None (which is the default), or as an integer.
            If given as None, then the `num_episodes` value that was given
            while initializing this GymNE will be used.
        decrease_rewards_by: How much each reward value should be
            decreased. If left as None, the `decrease_rewards_by` value
            value that was given while initializing this GymNE will be
            used.
    Returns:
        A dictionary containing the score and the timestep count.
    """
    if not isinstance(policy, nn.Module):
        policy = self.make_net(policy)

    if num_episodes is None:
        num_episodes = self._num_episodes

    try:
        policy.eval()

        episode_results = [
            self._rollout(
                policy=policy,
                update_stats=update_stats,
                visualize=visualize,
                decrease_rewards_by=decrease_rewards_by,
            )
            for _ in range(num_episodes)
        ]

        results = _accumulate_all_across_dicts(episode_results, self._info_keys)
        return results
    finally:
        policy.train()

save_solution(self, solution, fname)

Save the solution into a pickle file. Among the saved data within the pickle file are the solution (as a PyTorch tensor), the policy (as a torch.nn.Module instance), and observation stats (if any).

Parameters:

Name Type Description Default
solution Iterable

The solution to be saved. This can be a PyTorch tensor, a Solution instance, or any Iterable.

required
fname Union[str, pathlib.Path]

The file name of the pickle file to be created.

required
Source code in evotorch/neuroevolution/gymne.py
def save_solution(self, solution: Iterable, fname: Union[str, Path]):
    """
    Save the solution into a pickle file.
    Among the saved data within the pickle file are the solution
    (as a PyTorch tensor), the policy (as a `torch.nn.Module` instance),
    and observation stats (if any).

    Args:
        solution: The solution to be saved. This can be a PyTorch tensor,
            a `Solution` instance, or any `Iterable`.
        fname: The file name of the pickle file to be created.
    """

    # Convert the solution to a PyTorch tensor on the cpu.
    if isinstance(solution, torch.Tensor):
        solution = solution.to("cpu")
    elif isinstance(solution, Solution):
        solution = solution.values.clone().to("cpu")
    else:
        solution = torch.as_tensor(solution, dtype=torch.float32, device="cpu")

    if isinstance(solution, ReadOnlyTensor):
        solution = solution.as_subclass(torch.Tensor)

    policy = self.to_policy(solution).to("cpu")

    # Store the solution and the policy.
    result = {
        "solution": solution,
        "policy": policy,
    }

    # If available, store the observation stats.
    if self.observation_normalization and (self._obs_stats is not None):
        result["obs_mean"] = torch.as_tensor(self._obs_stats.mean)
        result["obs_stdev"] = torch.as_tensor(self._obs_stats.stdev)
        result["obs_sum"] = torch.as_tensor(self._obs_stats.sum)
        result["obs_sum_of_squares"] = torch.as_tensor(self._obs_stats.sum_of_squares)

    # Some additional data.
    result["interaction_count"] = self.interaction_count
    result["episode_count"] = self.episode_count
    result["time"] = datetime.now()

    # If the environment is specified via a string ID, then store that ID.
    if isinstance(self._env_maker, str):
        result["env"] = self._env_maker

    # Save the dictionary which stores the data.
    with open(fname, "wb") as f:
        pickle.dump(result, f)

set_episode_count(self, n)

Set the episode count manually.

Source code in evotorch/neuroevolution/gymne.py
def set_episode_count(self, n: int):
    """
    Set the episode count manually.
    """
    self._episode_count = int(n)

set_interaction_count(self, n)

Set the interaction count manually.

Source code in evotorch/neuroevolution/gymne.py
def set_interaction_count(self, n: int):
    """
    Set the interaction count manually.
    """
    self._interaction_count = int(n)

set_observation_stats(self, rs)

Set the observation stats

Source code in evotorch/neuroevolution/gymne.py
def set_observation_stats(self, rs: RunningStat):
    """Set the observation stats"""
    self._ensure_obsnorm()
    self._obs_stats.reset()
    self._obs_stats.update(rs)

to_policy(self, x, *, clip_actions=True)

Convert the given parameter vector to a policy as a PyTorch module.

If the problem is configured to have observation normalization, the PyTorch module also contains an additional normalization layer.

Parameters:

Name Type Description Default
x Iterable

An sequence of real numbers, containing the parameters of a policy. Can be a PyTorch tensor, a numpy array, or a Solution.

required
clip_actions bool

Whether or not to add an action clipping layer so that the generated actions will always be within an acceptable range for the environment.

True

Returns:

Type Description
Module

The policy expressed by the parameters.

Source code in evotorch/neuroevolution/gymne.py
def to_policy(self, x: Iterable, *, clip_actions: bool = True) -> nn.Module:
    """
    Convert the given parameter vector to a policy as a PyTorch module.

    If the problem is configured to have observation normalization,
    the PyTorch module also contains an additional normalization layer.

    Args:
        x: An sequence of real numbers, containing the parameters
            of a policy. Can be a PyTorch tensor, a numpy array,
            or a Solution.
        clip_actions: Whether or not to add an action clipping layer so
            that the generated actions will always be within an
            acceptable range for the environment.
    Returns:
        The policy expressed by the parameters.
    """

    policy = self.make_net(x)

    if self.observation_normalization and (self._obs_stats.count > 0):
        policy = ObsNormWrapperModule(policy, self._obs_stats)

    if clip_actions and isinstance(self._get_env().action_space, gym.spaces.Box):
        policy = ActClipWrapperModule(policy, self._get_env().action_space)

    return policy

update_observation_stats(self, rs)

Update the observation stats via another RunningStat instance

Source code in evotorch/neuroevolution/gymne.py
def update_observation_stats(self, rs: RunningStat):
    """Update the observation stats via another RunningStat instance"""
    self._ensure_obsnorm()
    self._obs_stats.update(rs)

visualize(self, policy, *, update_stats=False, num_episodes=1, decrease_rewards_by=None)

Evaluate the policy and render its actions in the environment.

Parameters:

Name Type Description Default
policy Union[torch.nn.modules.module.Module, Iterable]

The policy to be evaluated. This can be a torch module or a sequence of real numbers representing the parameters of a policy network.

required
update_stats bool

Whether or not to update the observation normalization data while running the policy. If observation normalization is not enabled, then this argument will be ignored.

False
num_episodes Optional[int]

Over how many episodes will the policy be evaluated. Expected as None (which is the default), or as an integer. If given as None, then the num_episodes value that was given while initializing this GymNE will be used.

1
decrease_rewards_by Optional[float]

How much each reward value should be decreased. If left as None, the decrease_rewards_by value value that was given while initializing this GymNE will be used.

None

Returns:

Type Description
dict

A dictionary containing the score and the timestep count.

Source code in evotorch/neuroevolution/gymne.py
def visualize(
    self,
    policy: Union[nn.Module, Iterable],
    *,
    update_stats: bool = False,
    num_episodes: Optional[int] = 1,
    decrease_rewards_by: Optional[float] = None,
) -> dict:
    """
    Evaluate the policy and render its actions in the environment.

    Args:
        policy: The policy to be evaluated. This can be a torch module
            or a sequence of real numbers representing the parameters
            of a policy network.
        update_stats: Whether or not to update the observation
            normalization data while running the policy. If observation
            normalization is not enabled, then this argument will be
            ignored.
        num_episodes: Over how many episodes will the policy be evaluated.
            Expected as None (which is the default), or as an integer.
            If given as None, then the `num_episodes` value that was given
            while initializing this GymNE will be used.
        decrease_rewards_by: How much each reward value should be
            decreased. If left as None, the `decrease_rewards_by` value
            value that was given while initializing this GymNE will be
            used.
    Returns:
        A dictionary containing the score and the timestep count.
    """
    return self.run(
        policy=policy,
        update_stats=update_stats,
        visualize=True,
        num_episodes=num_episodes,
        decrease_rewards_by=decrease_rewards_by,
    )

neproblem

This namespace contains the NeuroevolutionProblem class.

NEProblem (BaseNEProblem)

Base class for neuro-evolution problems where the goal is to optimize the parameters of a neural network represented as a PyTorch module.

Any problem inheriting from this class is expected to override the method _evaluate_network(self, net: torch.nn.Module) -> Union[torch.Tensor, float] where net is the neural network to be evaluated, and the return value is a scalar or a vector (for multi-objective cases) expressing the fitness value(s).

Alternatively, this class can be directly instantiated in the following way:

def f(module: MyTorchModuleClass) -> Union[float, torch.Tensor, tuple]:
    # Evaluate the given PyTorch module here
    fitness = ...
    return fitness


problem = NEProblem("min", MyTorchModuleClass, f, ...)

which specifies that the problem's goal is to minimize the return of the function f. For multi-objective cases, the fitness returned by f is expected as a 1-dimensional tensor. For when the problem has additional evaluation data, a two-element tuple can be returned by f instead, where the first element is the fitness value(s) and the second element is a 1-dimensional tensor storing the additional data.

Source code in evotorch/neuroevolution/neproblem.py
class NEProblem(BaseNEProblem):
    """
    Base class for neuro-evolution problems where the goal is to optimize the
    parameters of a neural network represented as a PyTorch module.

    Any problem inheriting from this class is expected to override the method
    `_evaluate_network(self, net: torch.nn.Module) -> Union[torch.Tensor, float]`
    where `net` is the neural network to be evaluated, and the return value
    is a scalar or a vector (for multi-objective cases) expressing the
    fitness value(s).

    Alternatively, this class can be directly instantiated in the following
    way:

    ```python
    def f(module: MyTorchModuleClass) -> Union[float, torch.Tensor, tuple]:
        # Evaluate the given PyTorch module here
        fitness = ...
        return fitness


    problem = NEProblem("min", MyTorchModuleClass, f, ...)
    ```

    which specifies that the problem's goal is to minimize the return of the
    function `f`.
    For multi-objective cases, the fitness returned by `f` is expected as a
    1-dimensional tensor. For when the problem has additional evaluation data,
    a two-element tuple can be returned by `f` instead, where the first
    element is the fitness value(s) and the second element is a 1-dimensional
    tensor storing the additional data.
    """

    def __init__(
        self,
        objective_sense: ObjectiveSense,
        network: Union[str, nn.Module, Callable[[], nn.Module]],
        network_eval_func: Optional[Callable] = None,
        *,
        network_args: Optional[dict] = None,
        initial_bounds: Optional[BoundsPairLike] = (-0.00001, 0.00001),
        eval_dtype: Optional[DType] = None,
        eval_data_length: int = 0,
        seed: Optional[int] = None,
        num_actors: Optional[Union[int, str]] = None,
        actor_config: Optional[dict] = None,
        num_gpus_per_actor: Optional[Union[int, float, str]] = None,
        num_subbatches: Optional[int] = None,
        subbatch_size: Optional[int] = None,
        device: Optional[Device] = None,
    ):
        """
        `__init__(...)`: Initialize the NEProblem.

        Args:
            objective_sense: The objective sense, expected as "min" or "max"
                for single-objective cases, or as a sequence of strings
                (each string being "min" or "max") for multi-objective cases.
            network: A network structure string, or a Callable (which can be
                a class inheriting from `torch.nn.Module`, or a function
                which returns a `torch.nn.Module` instance), or an instance
                of `torch.nn.Module`.
                The object provided here determines the structure of the
                neural network whose parameters will be evolved.
                A network structure string is a string which can be processed
                by `evotorch.neuroevolution.net.str_to_net(...)`.
                Please see the documentation of the function
                `evotorch.neuroevolution.net.str_to_net(...)` to see how such
                a neural network structure string looks like.
            network_eval_func: Optionally a function (or any Callable object)
                which receives a PyTorch module as its argument, and returns
                either a fitness, or a two-element tuple containing the fitness
                and the additional evaluation data. The fitness can be a scalar
                (for single-objective cases) or a 1-dimensional tensor (for
                multi-objective cases). The additional evaluation data is
                expected as a 1-dimensional tensor.
                If this argument is left as None, it will be expected that
                the method `_evaluate_network(...)` is overriden by the
                inheriting class.
            network_args: Optionally a dict-like object, storing keyword
                arguments to be passed to the network while instantiating it.
            initial_bounds: Specifies an interval from which the values of the
                initial neural network parameters will be drawn.
            eval_dtype: dtype to be used for fitnesses. If not specified, then
                `eval_dtype` will be inferred from the dtype of the parameters
                of the neural network.
                In more details, if the neural network's parameters have a
                float dtype, `eval_dtype` will be a compatible float.
                Otherwise, it will be "float32".
            eval_data_length: Length of the extra evaluation data.
            seed: Random number seed. If left as None, this NEProblem instance
                will not have its own random generator, and the global random
                generator of PyTorch will be used instead.
            num_actors: Number of actors to create for parallelized
                evaluation of the solutions.
                Certain string values are also accepted.
                When given as "max" or as "num_cpus", the number of actors
                will be equal to the number of all available CPUs in the ray
                cluster.
                When given as "num_gpus", the number of actors will be
                equal to the number of all available GPUs in the ray
                cluster, and each actor will be assigned a GPU.
                When given as "num_devices", the number of actors will be
                equal to the minimum among the number of CPUs and the number
                of GPUs available in the cluster (or will be equal to the
                number of CPUs if there is no GPU), and each actor will be
                assigned a GPU (if available).
                If `num_actors` is given as "num_gpus" or "num_devices",
                the argument `num_gpus_per_actor` must not be used,
                and the `actor_config` dictionary must not contain the
                key "num_gpus".
                If `num_actors` is given as something other than "num_gpus"
                or "num_devices", and if you wish to assign GPUs to each
                actor, then please see the argument `num_gpus_per_actor`.
            actor_config: A dictionary, representing the keyword arguments
                to be passed to the options(...) used when creating the
                ray actor objects. To be used for explicitly allocating
                resources per each actor.
                For example, for declaring that each actor is to use a GPU,
                one can pass `actor_config=dict(num_gpus=1)`.
                Can also be given as None (which is the default),
                if no such options are to be passed.
            num_gpus_per_actor: Number of GPUs to be allocated by each
                remote actor.
                The default behavior is to NOT allocate any GPU at all
                (which is the default behavior of the ray library as well).
                When given as a number `n`, each actor will be given
                `n` GPUs (where `n` can be an integer, or can be a `float`
                for fractional allocation).
                When given as a string "max", then the available GPUs
                across the entire ray cluster (or within the local computer
                in the simplest cases) will be equally distributed among
                the actors.
                When given as a string "all", then each actor will have
                access to all the GPUs (this will be achieved by suppressing
                the environment variable `CUDA_VISIBLE_DEVICES` for each
                actor).
                When the problem is not distributed (i.e. when there are
                no actors), this argument is expected to be left as None.
            num_subbatches: If `num_subbatches` is None (assuming that
                `subbatch_size` is also None), then, when evaluating a
                population, the population will be split into n pieces, `n`
                being the number of actors, and each actor will evaluate
                its assigned piece. If `num_subbatches` is an integer `m`,
                then the population will be split into `m` pieces,
                and actors will continually accept the next unevaluated
                piece as they finish their current tasks.
                The arguments `num_subbatches` and `subbatch_size` cannot
                be given values other than None at the same time.
                While using a distributed algorithm, this argument determines
                how many sub-batches will be generated, and therefore,
                how many gradients will be computed by the remote actors.
            subbatch_size: If `subbatch_size` is None (assuming that
                `num_subbatches` is also None), then, when evaluating a
                population, the population will be split into `n` pieces, `n`
                being the number of actors, and each actor will evaluate its
                assigned piece. If `subbatch_size` is an integer `m`,
                then the population will be split into pieces of size `m`,
                and actors will continually accept the next unevaluated
                piece as they finish their current tasks.
                When there can be significant difference across the solutions
                in terms of computational requirements, specifying a
                `subbatch_size` can be beneficial, because, while one
                actor is busy with a subbatch containing computationally
                challenging solutions, other actors can accept more
                tasks and save time.
                The arguments `num_subbatches` and `subbatch_size` cannot
                be given values other than None at the same time.
                While using a distributed algorithm, this argument determines
                the size of a sub-batch (or sub-population) sampled by a
                remote actor for computing a gradient.
                In distributed mode, it is expected that the population size
                is divisible by `subbatch_size`.
            device: Default device in which a new population will be generated
                and the neural networks will operate.
                If not specified, "cpu" will be used.
        """
        # Set the main device of the problem
        # Although the operation of setting the main device is done by the main Problem class,
        # here we need this at an earlier stage.
        if device is None:
            device = "cpu"
        self._device = torch.device(device)

        # Set the network
        self._original_network = network
        self._network_args = {} if network_args is None else deepcopy(network_args)
        if isinstance(self._original_network, nn.Module):
            self._original_network = self._original_network.cpu()

        # Store the function that will evaluate the network, if available
        self._network_eval_func: Optional[Callable] = network_eval_func

        self.instantiated_network: nn.Module = None

        # Create temporary network
        temp_network = self._instantiate_net(self._original_network, device="cpu")

        super().__init__(
            objective_sense=objective_sense,
            initial_bounds=initial_bounds,
            bounds=None,  # Neuroevolution is an unbounded problem
            solution_length=count_parameters(temp_network),  # The solution length is inherited from the network passed
            dtype=next(temp_network.parameters()).dtype,  # The datatype is inherited from the network passed
            eval_dtype=eval_dtype,
            device=device,
            eval_data_length=eval_data_length,
            seed=seed,
            num_actors=num_actors,
            num_gpus_per_actor=num_gpus_per_actor,
            actor_config=actor_config,
            num_subbatches=num_subbatches,
            subbatch_size=subbatch_size,
            store_solution_stats=None,
        )

    @property
    def network_device(self) -> Device:
        """The device on which the problem should place data e.g. the network"""
        cpu_device = torch.device("cpu")
        if self.is_main:
            # This is the case where this is the main process (not a remote actor)
            if self.device == cpu_device:
                # If the main device of the problem is "cpu", then we assume that the network is going to be on the cpu as well
                return cpu_device
            else:
                # If the main device of the problem is some other device, then it is that device into which the network will be put
                return self.device
        else:
            # If this is a remote actor, then the network will be put into the auxiliary device allocated for that actor
            return self.aux_device

    @property
    def _str_network_constants(self) -> dict:
        """
        Named constants which will be passed to `str_to_net`.
        To be overridden by the user for custom fixed constants for a problem.
        """
        return {}

    @property
    def _network_constants(self) -> dict:
        """
        Named constants which will be passed to the network instantiation.
        To be overridden by the user for custom fixed constants for a problem.
        """
        return {}

    def network_constants(self) -> dict:
        """Named constants which can be passed to the network instantiation"""
        constants = {}
        constants.update(self._network_constants)
        constants.update(self._network_args)
        return constants

    @property
    def _nonserialized_attribs(self) -> List[str]:
        return ["instantiated_network"]

    def _instantiate_net(self, network: Union[str, nn.Module, dict], device: Optional[Device] = None) -> nn.Module:
        """Instantiate the network on the target device, to be overridden by the user for custom behaviour
        Returns:
            instantiated_network (nn.Module): The network instantiated on the target device
        """
        # Branching point determines instantiation of network
        if isinstance(network, str):
            # Passed argument was a string representation of a torch module
            net_consts = {}
            net_consts.update(self.network_constants())
            net_consts.update(self._str_network_constants)
            instantiated_network = str_to_net(network, **net_consts)
        elif isinstance(network, nn.Module):
            # Passed argument was directly a torch module
            instantiated_network = network
        else:
            # Passed argument was callable yielding network
            instantiated_network = pass_info_if_needed(network, self._network_constants)(**self._network_args)

        # Map to device
        device = self.network_device if device is None else device
        instantiated_network = instantiated_network.to(device)

        return instantiated_network

    def _prepare(self) -> None:
        """Instantiate the network on the target device, if not already done"""
        self.instantiated_network = self._instantiate_net(self._original_network)
        # Clear reference to original network
        self._original_network = None

    def make_net(self, parameters: Iterable) -> nn.Module:
        """
        Make a new network filled with the provided parameters.

        Args:
            parameters: Parameters to be used as weights within the network.
                Can be a Solution, or any 1-dimensional Iterable that can be
                converted to a PyTorch tensor.
        Returns:
            A new network, as a `torch.Module` instance.
        """
        if isinstance(parameters, Solution):
            parameters = parameters.access_values(keep_evals=True)
        else:
            parameters = self.as_tensor(parameters)
        with torch.no_grad():
            net = deepcopy(self.parameterize_net(parameters))
        return net

    def parameterize_net(self, parameters: torch.Tensor) -> nn.Module:
        """Parameterize the network with a given set of parameters.
        Args:
            parameters (torch.Tensor): The parameters with which to instantiate the network
        Returns:
            instantiated_network (nn.Module): The network instantiated with the parameters
        """
        # Check if network exists
        if self.instantiated_network is None:
            self.instantiated_network = self._instantiate_net(self._original_network)

        network = self.instantiated_network

        # Move the parameters if needed
        if parameters.device != self.network_device:
            parameters = parameters.to(self.network_device)

        # Fill the network with the parameters
        fill_parameters(network, parameters)

        # Return the network
        return network

    @property
    def _grad_device(self) -> Device:
        """
        Get the device in which new solutions will be made in distributed mode.

        In more details, in distributed mode, each actor creates its own
        sub-populations, evaluates them, and computes its own gradient
        (all such actor gradients eventually being collected by the
        distribution-based search algorithm in the main process).
        For some problem types, it can make sense for the remote actors to
        create their temporary sub-populations on another device
        (e.g. on the GPU that is allocated specifically for them).
        For such situations, one is encouraged to override this property
        and make it return whatever device is to be used.

        In the case of NEProblem, this property returns whatever device
        is specified by the property `network_device`.
        """
        return self.network_device

    def _evaluate_network(self, network: nn.Module) -> Union[float, torch.Tensor, tuple]:
        """
        Evaluate a network and return the evaluation result(s).

        In the case where the `__init__` of `NEProblem` was not given
        a network evaluator function (via the argument `network_eval_func`),
        it will be expected that the inheriting class overrides this
        method and defines how a network should be evaluated.

        Args:
            network (nn.Module): The network to evaluate
        Returns:
            fitness: The networks' fitness value(s), as a scalar for
                single-objective cases, or as a 1-dimensional tensor
                for multi-objective cases. The returned value can also
                be a two-element tuple where the first element is the
                fitness (as a scalar or as a vector) and the second
                element is a 1-dimensional vector storing the extra
                evaluation data.
        """
        raise NotImplementedError

    def _evaluate(self, solution: Solution):
        """
        Evaluate a single solution.
        This is achieved by parameterising the problem's attribute
        named `instantiated_network`, and then evaluating the network
        with the method `_evaluate_network(...)`.

        Args:
            solution (Solution): The solution to evaluate.
        """
        parameters = solution.values

        if self._network_eval_func is None:
            evaluator = self._evaluate_network
        else:
            evaluator = self._network_eval_func

        fitnesses = evaluator(self.parameterize_net(parameters))

        if isinstance(fitnesses, tuple):
            solution.set_evals(*fitnesses)
        else:
            solution.set_evals(fitnesses)

network_device: Union[str, torch.device] property readonly

The device on which the problem should place data e.g. the network

__init__(self, objective_sense, network, network_eval_func=None, *, network_args=None, initial_bounds=(-1e-05, 1e-05), eval_dtype=None, eval_data_length=0, seed=None, num_actors=None, actor_config=None, num_gpus_per_actor=None, num_subbatches=None, subbatch_size=None, device=None) special

__init__(...): Initialize the NEProblem.

Parameters:

Name Type Description Default
objective_sense Union[str, Iterable[str]]

The objective sense, expected as "min" or "max" for single-objective cases, or as a sequence of strings (each string being "min" or "max") for multi-objective cases.

required
network Union[str, torch.nn.modules.module.Module, Callable[[], torch.nn.modules.module.Module]]

A network structure string, or a Callable (which can be a class inheriting from torch.nn.Module, or a function which returns a torch.nn.Module instance), or an instance of torch.nn.Module. The object provided here determines the structure of the neural network whose parameters will be evolved. A network structure string is a string which can be processed by evotorch.neuroevolution.net.str_to_net(...). Please see the documentation of the function evotorch.neuroevolution.net.str_to_net(...) to see how such a neural network structure string looks like.

required
network_eval_func Optional[Callable]

Optionally a function (or any Callable object) which receives a PyTorch module as its argument, and returns either a fitness, or a two-element tuple containing the fitness and the additional evaluation data. The fitness can be a scalar (for single-objective cases) or a 1-dimensional tensor (for multi-objective cases). The additional evaluation data is expected as a 1-dimensional tensor. If this argument is left as None, it will be expected that the method _evaluate_network(...) is overriden by the inheriting class.

None
network_args Optional[dict]

Optionally a dict-like object, storing keyword arguments to be passed to the network while instantiating it.

None
initial_bounds Union[Iterable[Union[float, Iterable[float], torch.Tensor]], evotorch.core.BoundsPair]

Specifies an interval from which the values of the initial neural network parameters will be drawn.

(-1e-05, 1e-05)
eval_dtype Union[str, torch.dtype, numpy.dtype, Type]

dtype to be used for fitnesses. If not specified, then eval_dtype will be inferred from the dtype of the parameters of the neural network. In more details, if the neural network's parameters have a float dtype, eval_dtype will be a compatible float. Otherwise, it will be "float32".

None
eval_data_length int

Length of the extra evaluation data.

0
seed Optional[int]

Random number seed. If left as None, this NEProblem instance will not have its own random generator, and the global random generator of PyTorch will be used instead.

None
num_actors Union[int, str]

Number of actors to create for parallelized evaluation of the solutions. Certain string values are also accepted. When given as "max" or as "num_cpus", the number of actors will be equal to the number of all available CPUs in the ray cluster. When given as "num_gpus", the number of actors will be equal to the number of all available GPUs in the ray cluster, and each actor will be assigned a GPU. When given as "num_devices", the number of actors will be equal to the minimum among the number of CPUs and the number of GPUs available in the cluster (or will be equal to the number of CPUs if there is no GPU), and each actor will be assigned a GPU (if available). If num_actors is given as "num_gpus" or "num_devices", the argument num_gpus_per_actor must not be used, and the actor_config dictionary must not contain the key "num_gpus". If num_actors is given as something other than "num_gpus" or "num_devices", and if you wish to assign GPUs to each actor, then please see the argument num_gpus_per_actor.

None
actor_config Optional[dict]

A dictionary, representing the keyword arguments to be passed to the options(...) used when creating the ray actor objects. To be used for explicitly allocating resources per each actor. For example, for declaring that each actor is to use a GPU, one can pass actor_config=dict(num_gpus=1). Can also be given as None (which is the default), if no such options are to be passed.

None
num_gpus_per_actor Union[int, float, str]

Number of GPUs to be allocated by each remote actor. The default behavior is to NOT allocate any GPU at all (which is the default behavior of the ray library as well). When given as a number n, each actor will be given n GPUs (where n can be an integer, or can be a float for fractional allocation). When given as a string "max", then the available GPUs across the entire ray cluster (or within the local computer in the simplest cases) will be equally distributed among the actors. When given as a string "all", then each actor will have access to all the GPUs (this will be achieved by suppressing the environment variable CUDA_VISIBLE_DEVICES for each actor). When the problem is not distributed (i.e. when there are no actors), this argument is expected to be left as None.

None
num_subbatches Optional[int]

If num_subbatches is None (assuming that subbatch_size is also None), then, when evaluating a population, the population will be split into n pieces, n being the number of actors, and each actor will evaluate its assigned piece. If num_subbatches is an integer m, then the population will be split into m pieces, and actors will continually accept the next unevaluated piece as they finish their current tasks. The arguments num_subbatches and subbatch_size cannot be given values other than None at the same time. While using a distributed algorithm, this argument determines how many sub-batches will be generated, and therefore, how many gradients will be computed by the remote actors.

None
subbatch_size Optional[int]

If subbatch_size is None (assuming that num_subbatches is also None), then, when evaluating a population, the population will be split into n pieces, n being the number of actors, and each actor will evaluate its assigned piece. If subbatch_size is an integer m, then the population will be split into pieces of size m, and actors will continually accept the next unevaluated piece as they finish their current tasks. When there can be significant difference across the solutions in terms of computational requirements, specifying a subbatch_size can be beneficial, because, while one actor is busy with a subbatch containing computationally challenging solutions, other actors can accept more tasks and save time. The arguments num_subbatches and subbatch_size cannot be given values other than None at the same time. While using a distributed algorithm, this argument determines the size of a sub-batch (or sub-population) sampled by a remote actor for computing a gradient. In distributed mode, it is expected that the population size is divisible by subbatch_size.

None
device Union[str, torch.device]

Default device in which a new population will be generated and the neural networks will operate. If not specified, "cpu" will be used.

None
Source code in evotorch/neuroevolution/neproblem.py
def __init__(
    self,
    objective_sense: ObjectiveSense,
    network: Union[str, nn.Module, Callable[[], nn.Module]],
    network_eval_func: Optional[Callable] = None,
    *,
    network_args: Optional[dict] = None,
    initial_bounds: Optional[BoundsPairLike] = (-0.00001, 0.00001),
    eval_dtype: Optional[DType] = None,
    eval_data_length: int = 0,
    seed: Optional[int] = None,
    num_actors: Optional[Union[int, str]] = None,
    actor_config: Optional[dict] = None,
    num_gpus_per_actor: Optional[Union[int, float, str]] = None,
    num_subbatches: Optional[int] = None,
    subbatch_size: Optional[int] = None,
    device: Optional[Device] = None,
):
    """
    `__init__(...)`: Initialize the NEProblem.

    Args:
        objective_sense: The objective sense, expected as "min" or "max"
            for single-objective cases, or as a sequence of strings
            (each string being "min" or "max") for multi-objective cases.
        network: A network structure string, or a Callable (which can be
            a class inheriting from `torch.nn.Module`, or a function
            which returns a `torch.nn.Module` instance), or an instance
            of `torch.nn.Module`.
            The object provided here determines the structure of the
            neural network whose parameters will be evolved.
            A network structure string is a string which can be processed
            by `evotorch.neuroevolution.net.str_to_net(...)`.
            Please see the documentation of the function
            `evotorch.neuroevolution.net.str_to_net(...)` to see how such
            a neural network structure string looks like.
        network_eval_func: Optionally a function (or any Callable object)
            which receives a PyTorch module as its argument, and returns
            either a fitness, or a two-element tuple containing the fitness
            and the additional evaluation data. The fitness can be a scalar
            (for single-objective cases) or a 1-dimensional tensor (for
            multi-objective cases). The additional evaluation data is
            expected as a 1-dimensional tensor.
            If this argument is left as None, it will be expected that
            the method `_evaluate_network(...)` is overriden by the
            inheriting class.
        network_args: Optionally a dict-like object, storing keyword
            arguments to be passed to the network while instantiating it.
        initial_bounds: Specifies an interval from which the values of the
            initial neural network parameters will be drawn.
        eval_dtype: dtype to be used for fitnesses. If not specified, then
            `eval_dtype` will be inferred from the dtype of the parameters
            of the neural network.
            In more details, if the neural network's parameters have a
            float dtype, `eval_dtype` will be a compatible float.
            Otherwise, it will be "float32".
        eval_data_length: Length of the extra evaluation data.
        seed: Random number seed. If left as None, this NEProblem instance
            will not have its own random generator, and the global random
            generator of PyTorch will be used instead.
        num_actors: Number of actors to create for parallelized
            evaluation of the solutions.
            Certain string values are also accepted.
            When given as "max" or as "num_cpus", the number of actors
            will be equal to the number of all available CPUs in the ray
            cluster.
            When given as "num_gpus", the number of actors will be
            equal to the number of all available GPUs in the ray
            cluster, and each actor will be assigned a GPU.
            When given as "num_devices", the number of actors will be
            equal to the minimum among the number of CPUs and the number
            of GPUs available in the cluster (or will be equal to the
            number of CPUs if there is no GPU), and each actor will be
            assigned a GPU (if available).
            If `num_actors` is given as "num_gpus" or "num_devices",
            the argument `num_gpus_per_actor` must not be used,
            and the `actor_config` dictionary must not contain the
            key "num_gpus".
            If `num_actors` is given as something other than "num_gpus"
            or "num_devices", and if you wish to assign GPUs to each
            actor, then please see the argument `num_gpus_per_actor`.
        actor_config: A dictionary, representing the keyword arguments
            to be passed to the options(...) used when creating the
            ray actor objects. To be used for explicitly allocating
            resources per each actor.
            For example, for declaring that each actor is to use a GPU,
            one can pass `actor_config=dict(num_gpus=1)`.
            Can also be given as None (which is the default),
            if no such options are to be passed.
        num_gpus_per_actor: Number of GPUs to be allocated by each
            remote actor.
            The default behavior is to NOT allocate any GPU at all
            (which is the default behavior of the ray library as well).
            When given as a number `n`, each actor will be given
            `n` GPUs (where `n` can be an integer, or can be a `float`
            for fractional allocation).
            When given as a string "max", then the available GPUs
            across the entire ray cluster (or within the local computer
            in the simplest cases) will be equally distributed among
            the actors.
            When given as a string "all", then each actor will have
            access to all the GPUs (this will be achieved by suppressing
            the environment variable `CUDA_VISIBLE_DEVICES` for each
            actor).
            When the problem is not distributed (i.e. when there are
            no actors), this argument is expected to be left as None.
        num_subbatches: If `num_subbatches` is None (assuming that
            `subbatch_size` is also None), then, when evaluating a
            population, the population will be split into n pieces, `n`
            being the number of actors, and each actor will evaluate
            its assigned piece. If `num_subbatches` is an integer `m`,
            then the population will be split into `m` pieces,
            and actors will continually accept the next unevaluated
            piece as they finish their current tasks.
            The arguments `num_subbatches` and `subbatch_size` cannot
            be given values other than None at the same time.
            While using a distributed algorithm, this argument determines
            how many sub-batches will be generated, and therefore,
            how many gradients will be computed by the remote actors.
        subbatch_size: If `subbatch_size` is None (assuming that
            `num_subbatches` is also None), then, when evaluating a
            population, the population will be split into `n` pieces, `n`
            being the number of actors, and each actor will evaluate its
            assigned piece. If `subbatch_size` is an integer `m`,
            then the population will be split into pieces of size `m`,
            and actors will continually accept the next unevaluated
            piece as they finish their current tasks.
            When there can be significant difference across the solutions
            in terms of computational requirements, specifying a
            `subbatch_size` can be beneficial, because, while one
            actor is busy with a subbatch containing computationally
            challenging solutions, other actors can accept more
            tasks and save time.
            The arguments `num_subbatches` and `subbatch_size` cannot
            be given values other than None at the same time.
            While using a distributed algorithm, this argument determines
            the size of a sub-batch (or sub-population) sampled by a
            remote actor for computing a gradient.
            In distributed mode, it is expected that the population size
            is divisible by `subbatch_size`.
        device: Default device in which a new population will be generated
            and the neural networks will operate.
            If not specified, "cpu" will be used.
    """
    # Set the main device of the problem
    # Although the operation of setting the main device is done by the main Problem class,
    # here we need this at an earlier stage.
    if device is None:
        device = "cpu"
    self._device = torch.device(device)

    # Set the network
    self._original_network = network
    self._network_args = {} if network_args is None else deepcopy(network_args)
    if isinstance(self._original_network, nn.Module):
        self._original_network = self._original_network.cpu()

    # Store the function that will evaluate the network, if available
    self._network_eval_func: Optional[Callable] = network_eval_func

    self.instantiated_network: nn.Module = None

    # Create temporary network
    temp_network = self._instantiate_net(self._original_network, device="cpu")

    super().__init__(
        objective_sense=objective_sense,
        initial_bounds=initial_bounds,
        bounds=None,  # Neuroevolution is an unbounded problem
        solution_length=count_parameters(temp_network),  # The solution length is inherited from the network passed
        dtype=next(temp_network.parameters()).dtype,  # The datatype is inherited from the network passed
        eval_dtype=eval_dtype,
        device=device,
        eval_data_length=eval_data_length,
        seed=seed,
        num_actors=num_actors,
        num_gpus_per_actor=num_gpus_per_actor,
        actor_config=actor_config,
        num_subbatches=num_subbatches,
        subbatch_size=subbatch_size,
        store_solution_stats=None,
    )

make_net(self, parameters)

Make a new network filled with the provided parameters.

Parameters:

Name Type Description Default
parameters Iterable

Parameters to be used as weights within the network. Can be a Solution, or any 1-dimensional Iterable that can be converted to a PyTorch tensor.

required

Returns:

Type Description
Module

A new network, as a torch.Module instance.

Source code in evotorch/neuroevolution/neproblem.py
def make_net(self, parameters: Iterable) -> nn.Module:
    """
    Make a new network filled with the provided parameters.

    Args:
        parameters: Parameters to be used as weights within the network.
            Can be a Solution, or any 1-dimensional Iterable that can be
            converted to a PyTorch tensor.
    Returns:
        A new network, as a `torch.Module` instance.
    """
    if isinstance(parameters, Solution):
        parameters = parameters.access_values(keep_evals=True)
    else:
        parameters = self.as_tensor(parameters)
    with torch.no_grad():
        net = deepcopy(self.parameterize_net(parameters))
    return net

network_constants(self)

Named constants which can be passed to the network instantiation

Source code in evotorch/neuroevolution/neproblem.py
def network_constants(self) -> dict:
    """Named constants which can be passed to the network instantiation"""
    constants = {}
    constants.update(self._network_constants)
    constants.update(self._network_args)
    return constants

parameterize_net(self, parameters)

Parameterize the network with a given set of parameters.

Parameters:

Name Type Description Default
parameters torch.Tensor

The parameters with which to instantiate the network

required

Returns:

Type Description
instantiated_network (nn.Module)

The network instantiated with the parameters

Source code in evotorch/neuroevolution/neproblem.py
def parameterize_net(self, parameters: torch.Tensor) -> nn.Module:
    """Parameterize the network with a given set of parameters.
    Args:
        parameters (torch.Tensor): The parameters with which to instantiate the network
    Returns:
        instantiated_network (nn.Module): The network instantiated with the parameters
    """
    # Check if network exists
    if self.instantiated_network is None:
        self.instantiated_network = self._instantiate_net(self._original_network)

    network = self.instantiated_network

    # Move the parameters if needed
    if parameters.device != self.network_device:
        parameters = parameters.to(self.network_device)

    # Fill the network with the parameters
    fill_parameters(network, parameters)

    # Return the network
    return network

net special

Utility classes and functions for neural networks

functional

ModuleExpectingFlatParameters

A wrapper which brings a functional interface around a torch module.

Similar to functorch.FunctionalModule, ModuleExpectingFlatParameters turns a torch.nn.Module instance to a function which expects a new leftmost argument representing the parameters of the network. Unlike functorch.FunctionalModule, a ModuleExpectingFlatParameters instance, as its name suggests, expects the network parameters to be given as a 1-dimensional (i.e. flattened) tensor. Also, unlike functorch.FunctionalModule, an instance of ModuleExpectingFlatParameters is NOT an instance of torch.nn.Module.

PyTorch modules with buffers can be wrapped by this class, but it is assumed that those buffers are constant. If the wrapped module changes the value(s) of its buffer(s) during its forward passes, most probably things will NOT work right.

As an example, let us consider the following linear layer.

import torch
from torch import nn

net = nn.Linear(3, 8)

The functional counterpart of net can be obtained via:

from evotorch.neuroevolution.net import ModuleExpectingFlatParameters

fnet = ModuleExpectingFlatParameters(net)

Now, fnet is a callable object which expects network parameters and network inputs. Let us call fnet with randomly generated network parameters and with a randomly generated input tensor.

param_length = fnet.parameter_length
random_parameters = torch.randn(param_length)
random_input = torch.randn(3)

result = fnet(random_parameters, random_input)
Source code in evotorch/neuroevolution/net/functional.py
class ModuleExpectingFlatParameters:
    """
    A wrapper which brings a functional interface around a torch module.

    Similar to `functorch.FunctionalModule`, `ModuleExpectingFlatParameters`
    turns a `torch.nn.Module` instance to a function which expects a new
    leftmost argument representing the parameters of the network.
    Unlike `functorch.FunctionalModule`, a `ModuleExpectingFlatParameters`
    instance, as its name suggests, expects the network parameters to be
    given as a 1-dimensional (i.e. flattened) tensor.
    Also, unlike `functorch.FunctionalModule`, an instance of
    `ModuleExpectingFlatParameters` is NOT an instance of `torch.nn.Module`.

    PyTorch modules with buffers can be wrapped by this class, but it is
    assumed that those buffers are constant. If the wrapped module changes
    the value(s) of its buffer(s) during its forward passes, most probably
    things will NOT work right.

    As an example, let us consider the following linear layer.

    ```python
    import torch
    from torch import nn

    net = nn.Linear(3, 8)
    ```

    The functional counterpart of `net` can be obtained via:

    ```python
    from evotorch.neuroevolution.net import ModuleExpectingFlatParameters

    fnet = ModuleExpectingFlatParameters(net)
    ```

    Now, `fnet` is a callable object which expects network parameters
    and network inputs. Let us call `fnet` with randomly generated network
    parameters and with a randomly generated input tensor.

    ```python
    param_length = fnet.parameter_length
    random_parameters = torch.randn(param_length)
    random_input = torch.randn(3)

    result = fnet(random_parameters, random_input)
    ```
    """

    @torch.no_grad()
    def __init__(self, net: nn.Module, *, disable_autograd_tracking: bool = False):
        """
        `__init__(...)`: Initialize the `ModuleExpectingFlatParameters` instance.

        Args:
            net: The module that is to be wrapped by a functional interface.
            disable_autograd_tracking: If given as True, all operations
                regarding the wrapped module will be performed in the context
                `torch.no_grad()`, forcefully disabling the autograd.
                If given as False, autograd will not be affected.
                The default is False.
        """

        # Declare the variables which will store information regarding the parameters of the module.
        self.__param_names = []
        self.__param_shapes = []
        self.__param_length = 0
        self.__param_slices = []
        self.__num_params = 0

        # Iterate over the parameters of the module and fill the related information.
        i = 0
        j = 0
        for pname, p in net.named_parameters():
            self.__param_names.append(pname)

            shape = p.shape
            self.__param_shapes.append(shape)

            length = _shape_length(shape)
            self.__param_length += length

            j = i + length
            self.__param_slices.append(slice(i, j))
            i = j

            self.__num_params += 1

        self.__buffer_dict = {bname: b.clone() for bname, b in net.named_buffers()}

        self.__net = deepcopy(net)
        self.__net.to("meta")
        self.__disable_autograd_tracking = bool(disable_autograd_tracking)

    def __transfer_buffers(self, x: torch.Tensor):
        """
        Transfer the buffer tensors to the device of the given tensor.

        Args:
            x: The tensor whose device will also store the buffer tensors.
        """
        for bname in self.__buffer_dict.keys():
            self.__buffer_dict[bname] = torch.as_tensor(self.__buffer_dict[bname], device=x.device)

    @property
    def buffers(self) -> tuple:
        """Get the stored buffers"""
        return tuple(self.__buffer_dict)

    @property
    def parameter_length(self) -> int:
        return self.__param_length

    def __call__(self, parameter_vector: torch.Tensor, x: torch.Tensor, h: Any = None) -> Any:
        """
        Call the wrapped module's forward pass procedure.

        Args:
            parameter_vector: A 1-dimensional tensor which represents the
                parameters of the tensor.
            x: The inputs.
            h: Hidden state(s), in case this is a recurrent network.
        Returns:
            The result of the forward pass.
        """
        if parameter_vector.ndim != 1:
            raise ValueError(
                f"Expected the parameters as 1 dimensional,"
                f" but the received parameter vector has {parameter_vector.ndim} dimensions"
            )
        if len(parameter_vector) != self.__param_length:
            raise ValueError(
                f"Expected a parameter vector of length {self.__param_length},"
                f" but the received parameter vector's length is {len(parameter_vector)}."
            )
        state_args = [] if h is None else [h]

        params_and_buffers = {}
        for i, pname in enumerate(self.__param_names):
            param_slice = self.__param_slices[i]
            param_shape = self.__param_shapes[i]
            param = parameter_vector[param_slice].reshape(param_shape)
            params_and_buffers[pname] = param

        # Make sure that the buffer tensors are in the same device with x
        self.__transfer_buffers(x)

        # Add the buffer tensors to the dictionary `params_and_buffers`
        params_and_buffers.update(self.__buffer_dict)

        # Prepare the no-gradient context if gradient tracking is disabled
        context = torch.no_grad() if self.__disable_autograd_tracking else nullcontext()

        # Run the module and return the results
        with context:
            return functional_call(self.__net, params_and_buffers, tuple([x, *state_args]))
buffers: tuple property readonly

Get the stored buffers

__call__(self, parameter_vector, x, h=None) special

Call the wrapped module's forward pass procedure.

Parameters:

Name Type Description Default
parameter_vector Tensor

A 1-dimensional tensor which represents the parameters of the tensor.

required
x Tensor

The inputs.

required
h Any

Hidden state(s), in case this is a recurrent network.

None

Returns:

Type Description
Any

The result of the forward pass.

Source code in evotorch/neuroevolution/net/functional.py
def __call__(self, parameter_vector: torch.Tensor, x: torch.Tensor, h: Any = None) -> Any:
    """
    Call the wrapped module's forward pass procedure.

    Args:
        parameter_vector: A 1-dimensional tensor which represents the
            parameters of the tensor.
        x: The inputs.
        h: Hidden state(s), in case this is a recurrent network.
    Returns:
        The result of the forward pass.
    """
    if parameter_vector.ndim != 1:
        raise ValueError(
            f"Expected the parameters as 1 dimensional,"
            f" but the received parameter vector has {parameter_vector.ndim} dimensions"
        )
    if len(parameter_vector) != self.__param_length:
        raise ValueError(
            f"Expected a parameter vector of length {self.__param_length},"
            f" but the received parameter vector's length is {len(parameter_vector)}."
        )
    state_args = [] if h is None else [h]

    params_and_buffers = {}
    for i, pname in enumerate(self.__param_names):
        param_slice = self.__param_slices[i]
        param_shape = self.__param_shapes[i]
        param = parameter_vector[param_slice].reshape(param_shape)
        params_and_buffers[pname] = param

    # Make sure that the buffer tensors are in the same device with x
    self.__transfer_buffers(x)

    # Add the buffer tensors to the dictionary `params_and_buffers`
    params_and_buffers.update(self.__buffer_dict)

    # Prepare the no-gradient context if gradient tracking is disabled
    context = torch.no_grad() if self.__disable_autograd_tracking else nullcontext()

    # Run the module and return the results
    with context:
        return functional_call(self.__net, params_and_buffers, tuple([x, *state_args]))
__init__(self, net, *, disable_autograd_tracking=False) special

__init__(...): Initialize the ModuleExpectingFlatParameters instance.

Parameters:

Name Type Description Default
net Module

The module that is to be wrapped by a functional interface.

required
disable_autograd_tracking bool

If given as True, all operations regarding the wrapped module will be performed in the context torch.no_grad(), forcefully disabling the autograd. If given as False, autograd will not be affected. The default is False.

False
Source code in evotorch/neuroevolution/net/functional.py
@torch.no_grad()
def __init__(self, net: nn.Module, *, disable_autograd_tracking: bool = False):
    """
    `__init__(...)`: Initialize the `ModuleExpectingFlatParameters` instance.

    Args:
        net: The module that is to be wrapped by a functional interface.
        disable_autograd_tracking: If given as True, all operations
            regarding the wrapped module will be performed in the context
            `torch.no_grad()`, forcefully disabling the autograd.
            If given as False, autograd will not be affected.
            The default is False.
    """

    # Declare the variables which will store information regarding the parameters of the module.
    self.__param_names = []
    self.__param_shapes = []
    self.__param_length = 0
    self.__param_slices = []
    self.__num_params = 0

    # Iterate over the parameters of the module and fill the related information.
    i = 0
    j = 0
    for pname, p in net.named_parameters():
        self.__param_names.append(pname)

        shape = p.shape
        self.__param_shapes.append(shape)

        length = _shape_length(shape)
        self.__param_length += length

        j = i + length
        self.__param_slices.append(slice(i, j))
        i = j

        self.__num_params += 1

    self.__buffer_dict = {bname: b.clone() for bname, b in net.named_buffers()}

    self.__net = deepcopy(net)
    self.__net.to("meta")
    self.__disable_autograd_tracking = bool(disable_autograd_tracking)

make_functional_module(net, *, disable_autograd_tracking=False)

Wrap a torch module so that it has a functional interface.

Similar to functorch.make_functional(...), this function turns a torch.nn.Module instance to a function which expects a new leftmost argument representing the parameters of the network. Unlike with functorch.make_functional(...), the parameters of the network are expected in a 1-dimensional (i.e. flattened) tensor.

PyTorch modules with buffers can be wrapped by this class, but it is assumed that those buffers are constant. If the wrapped module changes the value(s) of its buffer(s) during its forward passes, most probably things will NOT work right.

As an example, let us consider the following linear layer.

import torch
from torch import nn

net = nn.Linear(3, 8)

The functional counterpart of net can be obtained via:

from evotorch.neuroevolution.net import make_functional_module

fnet = make_functional_module(net)

Now, fnet is a callable object which expects network parameters and network inputs. Let us call fnet with randomly generated network parameters and with a randomly generated input tensor.

param_length = fnet.parameter_length
random_parameters = torch.randn(param_length)
random_input = torch.randn(3)

result = fnet(random_parameters, random_input)

Parameters:

Name Type Description Default
net Module

The torch.nn.Module instance to be wrapped by a functional interface.

required
disable_autograd_tracking bool

If given as True, all operations regarding the wrapped module will be performed in the context torch.no_grad(), forcefully disabling the autograd. If given as False, autograd will not be affected. The default is False.

False

Returns:

Type Description
ModuleExpectingFlatParameters

The functional wrapper, as an instance of evotorch.neuroevolution.net.ModuleExpectingFlatParameters.

Source code in evotorch/neuroevolution/net/functional.py
def make_functional_module(net: nn.Module, *, disable_autograd_tracking: bool = False) -> ModuleExpectingFlatParameters:
    """
    Wrap a torch module so that it has a functional interface.

    Similar to `functorch.make_functional(...)`, this function turns a
    `torch.nn.Module` instance to a function which expects a new leftmost
    argument representing the parameters of the network.
    Unlike with `functorch.make_functional(...)`, the parameters of the
    network are expected in a 1-dimensional (i.e. flattened) tensor.

    PyTorch modules with buffers can be wrapped by this class, but it is
    assumed that those buffers are constant. If the wrapped module changes
    the value(s) of its buffer(s) during its forward passes, most probably
    things will NOT work right.

    As an example, let us consider the following linear layer.

    ```python
    import torch
    from torch import nn

    net = nn.Linear(3, 8)
    ```

    The functional counterpart of `net` can be obtained via:

    ```python
    from evotorch.neuroevolution.net import make_functional_module

    fnet = make_functional_module(net)
    ```

    Now, `fnet` is a callable object which expects network parameters
    and network inputs. Let us call `fnet` with randomly generated network
    parameters and with a randomly generated input tensor.

    ```python
    param_length = fnet.parameter_length
    random_parameters = torch.randn(param_length)
    random_input = torch.randn(3)

    result = fnet(random_parameters, random_input)
    ```

    Args:
        net: The `torch.nn.Module` instance to be wrapped by a functional
            interface.
        disable_autograd_tracking: If given as True, all operations
            regarding the wrapped module will be performed in the context
            `torch.no_grad()`, forcefully disabling the autograd.
            If given as False, autograd will not be affected.
            The default is False.
    Returns:
        The functional wrapper, as an instance of
        `evotorch.neuroevolution.net.ModuleExpectingFlatParameters`.
    """
    return ModuleExpectingFlatParameters(net, disable_autograd_tracking=disable_autograd_tracking)

layers

Various neural network layer types

Apply (Module)

A torch module for applying an arithmetic operator on an input tensor

Source code in evotorch/neuroevolution/net/layers.py
class Apply(nn.Module):
    """A torch module for applying an arithmetic operator on an input tensor"""

    def __init__(self, operator: str, argument: float):
        """`__init__(...)`: Initialize the Apply module.

        Args:
            operator: Must be '+', '-', '*', '/', or '**'.
                Indicates which operation will be done
                on the input tensor.
            argument: Expected as a float, represents
                the right-argument of the operation
                (the left-argument being the input
                tensor).
        """
        nn.Module.__init__(self)

        self._operator = str(operator)
        assert self._operator in ("+", "-", "*", "/", "**")

        self._argument = float(argument)

    def forward(self, x):
        op = self._operator
        arg = self._argument
        if op == "+":
            return x + arg
        elif op == "-":
            return x - arg
        elif op == "*":
            return x * arg
        elif op == "/":
            return x / arg
        elif op == "**":
            return x**arg
        else:
            raise ValueError("Unknown operator:" + repr(op))

    def extra_repr(self):
        return "operator={}, argument={}".format(repr(self._operator), self._argument)
__init__(self, operator, argument) special

__init__(...): Initialize the Apply module.

Parameters:

Name Type Description Default
operator str

Must be '+', '-', '', '/', or '*'. Indicates which operation will be done on the input tensor.

required
argument float

Expected as a float, represents the right-argument of the operation (the left-argument being the input tensor).

required
Source code in evotorch/neuroevolution/net/layers.py
def __init__(self, operator: str, argument: float):
    """`__init__(...)`: Initialize the Apply module.

    Args:
        operator: Must be '+', '-', '*', '/', or '**'.
            Indicates which operation will be done
            on the input tensor.
        argument: Expected as a float, represents
            the right-argument of the operation
            (the left-argument being the input
            tensor).
    """
    nn.Module.__init__(self)

    self._operator = str(operator)
    assert self._operator in ("+", "-", "*", "/", "**")

    self._argument = float(argument)
extra_repr(self)

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Source code in evotorch/neuroevolution/net/layers.py
def extra_repr(self):
    return "operator={}, argument={}".format(repr(self._operator), self._argument)
forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x):
    op = self._operator
    arg = self._argument
    if op == "+":
        return x + arg
    elif op == "-":
        return x - arg
    elif op == "*":
        return x * arg
    elif op == "/":
        return x / arg
    elif op == "**":
        return x**arg
    else:
        raise ValueError("Unknown operator:" + repr(op))

Bin (Module)

A small torch module for binning the values of tensors.

In more details, considering a lower bound value lb, an upper bound value ub, and an input tensor x, each value within x closer to lb will be converted to lb and each value within x closer to ub will be converted to ub.

Source code in evotorch/neuroevolution/net/layers.py
class Bin(nn.Module):
    """A small torch module for binning the values of tensors.

    In more details, considering a lower bound value lb,
    an upper bound value ub, and an input tensor x,
    each value within x closer to lb will be converted to lb
    and each value within x closer to ub will be converted to ub.
    """

    def __init__(self, lb: float, ub: float):
        """`__init__(...)`: Initialize the Clip operator.

        Args:
            lb: Lower bound
            ub: Upper bound
        """
        nn.Module.__init__(self)
        self._lb = float(lb)
        self._ub = float(ub)
        self._interval_size = self._ub - self._lb
        self._shrink_amount = self._interval_size / 2.0
        self._shift_amount = (self._ub + self._lb) / 2.0

    def forward(self, x: torch.Tensor):
        x = x - self._shift_amount
        x = x / self._shrink_amount
        x = torch.sign(x)
        x = x * self._shrink_amount
        x = x + self._shift_amount
        return x

    def extra_repr(self):
        return "lb={}, ub={}".format(self._lb, self._ub)
__init__(self, lb, ub) special

__init__(...): Initialize the Clip operator.

Parameters:

Name Type Description Default
lb float

Lower bound

required
ub float

Upper bound

required
Source code in evotorch/neuroevolution/net/layers.py
def __init__(self, lb: float, ub: float):
    """`__init__(...)`: Initialize the Clip operator.

    Args:
        lb: Lower bound
        ub: Upper bound
    """
    nn.Module.__init__(self)
    self._lb = float(lb)
    self._ub = float(ub)
    self._interval_size = self._ub - self._lb
    self._shrink_amount = self._interval_size / 2.0
    self._shift_amount = (self._ub + self._lb) / 2.0
extra_repr(self)

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Source code in evotorch/neuroevolution/net/layers.py
def extra_repr(self):
    return "lb={}, ub={}".format(self._lb, self._ub)
forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x: torch.Tensor):
    x = x - self._shift_amount
    x = x / self._shrink_amount
    x = torch.sign(x)
    x = x * self._shrink_amount
    x = x + self._shift_amount
    return x

Clip (Module)

A small torch module for clipping the values of tensors

Source code in evotorch/neuroevolution/net/layers.py
class Clip(nn.Module):
    """A small torch module for clipping the values of tensors"""

    def __init__(self, lb: float, ub: float):
        """`__init__(...)`: Initialize the Clip operator.

        Args:
            lb: Lower bound. Values less than this will be clipped.
            ub: Upper bound. Values greater than this will be clipped.
        """
        nn.Module.__init__(self)
        self._lb = float(lb)
        self._ub = float(ub)

    def forward(self, x: torch.Tensor):
        return x.clamp(self._lb, self._ub)

    def extra_repr(self):
        return "lb={}, ub={}".format(self._lb, self._ub)
__init__(self, lb, ub) special

__init__(...): Initialize the Clip operator.

Parameters:

Name Type Description Default
lb float

Lower bound. Values less than this will be clipped.

required
ub float

Upper bound. Values greater than this will be clipped.

required
Source code in evotorch/neuroevolution/net/layers.py
def __init__(self, lb: float, ub: float):
    """`__init__(...)`: Initialize the Clip operator.

    Args:
        lb: Lower bound. Values less than this will be clipped.
        ub: Upper bound. Values greater than this will be clipped.
    """
    nn.Module.__init__(self)
    self._lb = float(lb)
    self._ub = float(ub)
extra_repr(self)

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Source code in evotorch/neuroevolution/net/layers.py
def extra_repr(self):
    return "lb={}, ub={}".format(self._lb, self._ub)
forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x: torch.Tensor):
    return x.clamp(self._lb, self._ub)

FeedForwardNet (Module)

Representation of a feed forward neural network as a torch Module.

An example initialization of a FeedForwardNet is as follows:

net = drt.FeedForwardNet(4, [(8, 'tanh'), (6, 'tanh')])

which means that we would like to have a network which expects an input vector of length 4 and passes its input through 2 tanh-activated hidden layers (with neurons count 8 and 6, respectively). The output of the last hidden layer (of length 6) is the final output vector.

The string representation of the module obtained via the example above is:

FeedForwardNet(
  (layer_0): Linear(in_features=4, out_features=8, bias=True)
  (actfunc_0): Tanh()
  (layer_1): Linear(in_features=8, out_features=6, bias=True)
  (actfunc_1): Tanh()
)
Source code in evotorch/neuroevolution/net/layers.py
class FeedForwardNet(nn.Module):
    """
    Representation of a feed forward neural network as a torch Module.

    An example initialization of a FeedForwardNet is as follows:

        net = drt.FeedForwardNet(4, [(8, 'tanh'), (6, 'tanh')])

    which means that we would like to have a network which expects an input
    vector of length 4 and passes its input through 2 tanh-activated hidden
    layers (with neurons count 8 and 6, respectively).
    The output of the last hidden layer (of length 6) is the final
    output vector.

    The string representation of the module obtained via the example above
    is:

        FeedForwardNet(
          (layer_0): Linear(in_features=4, out_features=8, bias=True)
          (actfunc_0): Tanh()
          (layer_1): Linear(in_features=8, out_features=6, bias=True)
          (actfunc_1): Tanh()
        )
    """

    LengthActTuple = Tuple[int, Union[str, Callable]]
    LengthActBiasTuple = Tuple[int, Union[str, Callable], Union[bool]]

    def __init__(self, input_size: int, layers: List[Union[LengthActTuple, LengthActBiasTuple]]):
        """`__init__(...)`: Initialize the FeedForward network.

        Args:
            input_size: Input size of the network, expected as an int.
            layers: Expected as a list of tuples,
                where each tuple is either of the form
                `(layer_size, activation_function)`
                or of the form
                `(layer_size, activation_function, bias)`
                in which
                (i) `layer_size` is an int, specifying the number of neurons;
                (ii) `activation_function` is None, or a callable object,
                or a string containing the name of the activation function
                ('relu', 'selu', 'elu', 'tanh', 'hardtanh', or 'sigmoid');
                (iii) `bias` is a boolean, specifying whether the layer
                is to have a bias or not.
                When omitted, bias is set to True.
        """

        nn.Module.__init__(self)

        for i, layer in enumerate(layers):
            if len(layer) == 2:
                size, actfunc = layer
                bias = True
            elif len(layer) == 3:
                size, actfunc, bias = layer
            else:
                assert False, "A layer tuple of invalid size is encountered"

            setattr(self, "layer_" + str(i), nn.Linear(input_size, size, bias=bias))

            if isinstance(actfunc, str):
                if actfunc == "relu":
                    actfunc = nn.ReLU()
                elif actfunc == "selu":
                    actfunc = nn.SELU()
                elif actfunc == "elu":
                    actfunc = nn.ELU()
                elif actfunc == "tanh":
                    actfunc = nn.Tanh()
                elif actfunc == "hardtanh":
                    actfunc = nn.Hardtanh()
                elif actfunc == "sigmoid":
                    actfunc = nn.Sigmoid()
                elif actfunc == "round":
                    actfunc = Round()
                else:
                    raise ValueError("Unknown activation function: " + repr(actfunc))

            setattr(self, "actfunc_" + str(i), actfunc)

            input_size = size

    def forward(self, x):
        i = 0
        while hasattr(self, "layer_" + str(i)):
            x = getattr(self, "layer_" + str(i))(x)
            f = getattr(self, "actfunc_" + str(i))
            if f is not None:
                x = f(x)
            i += 1
        return x
__init__(self, input_size, layers) special

__init__(...): Initialize the FeedForward network.

Parameters:

Name Type Description Default
input_size int

Input size of the network, expected as an int.

required
layers List[Union[Tuple[int, Union[str, Callable]], Tuple[int, Union[str, Callable], bool]]]

Expected as a list of tuples, where each tuple is either of the form (layer_size, activation_function) or of the form (layer_size, activation_function, bias) in which (i) layer_size is an int, specifying the number of neurons; (ii) activation_function is None, or a callable object, or a string containing the name of the activation function ('relu', 'selu', 'elu', 'tanh', 'hardtanh', or 'sigmoid'); (iii) bias is a boolean, specifying whether the layer is to have a bias or not. When omitted, bias is set to True.

required
Source code in evotorch/neuroevolution/net/layers.py
def __init__(self, input_size: int, layers: List[Union[LengthActTuple, LengthActBiasTuple]]):
    """`__init__(...)`: Initialize the FeedForward network.

    Args:
        input_size: Input size of the network, expected as an int.
        layers: Expected as a list of tuples,
            where each tuple is either of the form
            `(layer_size, activation_function)`
            or of the form
            `(layer_size, activation_function, bias)`
            in which
            (i) `layer_size` is an int, specifying the number of neurons;
            (ii) `activation_function` is None, or a callable object,
            or a string containing the name of the activation function
            ('relu', 'selu', 'elu', 'tanh', 'hardtanh', or 'sigmoid');
            (iii) `bias` is a boolean, specifying whether the layer
            is to have a bias or not.
            When omitted, bias is set to True.
    """

    nn.Module.__init__(self)

    for i, layer in enumerate(layers):
        if len(layer) == 2:
            size, actfunc = layer
            bias = True
        elif len(layer) == 3:
            size, actfunc, bias = layer
        else:
            assert False, "A layer tuple of invalid size is encountered"

        setattr(self, "layer_" + str(i), nn.Linear(input_size, size, bias=bias))

        if isinstance(actfunc, str):
            if actfunc == "relu":
                actfunc = nn.ReLU()
            elif actfunc == "selu":
                actfunc = nn.SELU()
            elif actfunc == "elu":
                actfunc = nn.ELU()
            elif actfunc == "tanh":
                actfunc = nn.Tanh()
            elif actfunc == "hardtanh":
                actfunc = nn.Hardtanh()
            elif actfunc == "sigmoid":
                actfunc = nn.Sigmoid()
            elif actfunc == "round":
                actfunc = Round()
            else:
                raise ValueError("Unknown activation function: " + repr(actfunc))

        setattr(self, "actfunc_" + str(i), actfunc)

        input_size = size
forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x):
    i = 0
    while hasattr(self, "layer_" + str(i)):
        x = getattr(self, "layer_" + str(i))(x)
        f = getattr(self, "actfunc_" + str(i))
        if f is not None:
            x = f(x)
        i += 1
    return x

LSTM (Module)

Source code in evotorch/neuroevolution/net/layers.py
class LSTM(nn.Module):
    def __init__(
        self,
        input_size: int,
        hidden_size: int,
        *,
        dtype: torch.dtype = torch.float32,
        device: Union[str, torch.device] = "cpu",
    ):
        super().__init__()
        input_size = int(input_size)
        hidden_size = int(hidden_size)

        self.input_size = input_size
        self.hidden_size = hidden_size

        def input_weight():
            return nn.Parameter(torch.randn(self.hidden_size, self.input_size, dtype=dtype, device=device))

        def weight():
            return nn.Parameter(torch.randn(self.hidden_size, self.hidden_size, dtype=dtype, device=device))

        def bias():
            return nn.Parameter(torch.zeros(self.hidden_size, dtype=dtype, device=device))

        self.W_ii = input_weight()
        self.W_if = input_weight()
        self.W_ig = input_weight()
        self.W_io = input_weight()

        self.W_hi = weight()
        self.W_hf = weight()
        self.W_hg = weight()
        self.W_ho = weight()

        self.b_ii = bias()
        self.b_if = bias()
        self.b_ig = bias()
        self.b_io = bias()

        self.b_hi = bias()
        self.b_hf = bias()
        self.b_hg = bias()
        self.b_ho = bias()

    def forward(self, x: torch.Tensor, hidden=None) -> tuple:
        sigm = torch.sigmoid
        tanh = torch.tanh

        if hidden is None:
            h_prev = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
            c_prev = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
        else:
            h_prev, c_prev = hidden

        i_t = sigm(self.W_ii @ x + self.b_ii + self.W_hi @ h_prev + self.b_hi)
        f_t = sigm(self.W_if @ x + self.b_if + self.W_hf @ h_prev + self.b_hf)
        g_t = tanh(self.W_ig @ x + self.b_ig + self.W_hg @ h_prev + self.b_hg)
        o_t = sigm(self.W_io @ x + self.b_io + self.W_ho @ h_prev + self.b_ho)
        c_t = f_t * c_prev + i_t * g_t
        h_t = o_t * tanh(c_t)

        return h_t, (h_t, c_t)

    def __repr__(self) -> str:
        clsname = type(self).__name__
        return f"{clsname}(input_size={self.input_size}, hidden_size={self.hidden_size})"
forward(self, x, hidden=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x: torch.Tensor, hidden=None) -> tuple:
    sigm = torch.sigmoid
    tanh = torch.tanh

    if hidden is None:
        h_prev = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
        c_prev = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
    else:
        h_prev, c_prev = hidden

    i_t = sigm(self.W_ii @ x + self.b_ii + self.W_hi @ h_prev + self.b_hi)
    f_t = sigm(self.W_if @ x + self.b_if + self.W_hf @ h_prev + self.b_hf)
    g_t = tanh(self.W_ig @ x + self.b_ig + self.W_hg @ h_prev + self.b_hg)
    o_t = sigm(self.W_io @ x + self.b_io + self.W_ho @ h_prev + self.b_ho)
    c_t = f_t * c_prev + i_t * g_t
    h_t = o_t * tanh(c_t)

    return h_t, (h_t, c_t)

LSTMNet (Module)

Source code in evotorch/neuroevolution/net/layers.py
class LSTM(nn.Module):
    def __init__(
        self,
        input_size: int,
        hidden_size: int,
        *,
        dtype: torch.dtype = torch.float32,
        device: Union[str, torch.device] = "cpu",
    ):
        super().__init__()
        input_size = int(input_size)
        hidden_size = int(hidden_size)

        self.input_size = input_size
        self.hidden_size = hidden_size

        def input_weight():
            return nn.Parameter(torch.randn(self.hidden_size, self.input_size, dtype=dtype, device=device))

        def weight():
            return nn.Parameter(torch.randn(self.hidden_size, self.hidden_size, dtype=dtype, device=device))

        def bias():
            return nn.Parameter(torch.zeros(self.hidden_size, dtype=dtype, device=device))

        self.W_ii = input_weight()
        self.W_if = input_weight()
        self.W_ig = input_weight()
        self.W_io = input_weight()

        self.W_hi = weight()
        self.W_hf = weight()
        self.W_hg = weight()
        self.W_ho = weight()

        self.b_ii = bias()
        self.b_if = bias()
        self.b_ig = bias()
        self.b_io = bias()

        self.b_hi = bias()
        self.b_hf = bias()
        self.b_hg = bias()
        self.b_ho = bias()

    def forward(self, x: torch.Tensor, hidden=None) -> tuple:
        sigm = torch.sigmoid
        tanh = torch.tanh

        if hidden is None:
            h_prev = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
            c_prev = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
        else:
            h_prev, c_prev = hidden

        i_t = sigm(self.W_ii @ x + self.b_ii + self.W_hi @ h_prev + self.b_hi)
        f_t = sigm(self.W_if @ x + self.b_if + self.W_hf @ h_prev + self.b_hf)
        g_t = tanh(self.W_ig @ x + self.b_ig + self.W_hg @ h_prev + self.b_hg)
        o_t = sigm(self.W_io @ x + self.b_io + self.W_ho @ h_prev + self.b_ho)
        c_t = f_t * c_prev + i_t * g_t
        h_t = o_t * tanh(c_t)

        return h_t, (h_t, c_t)

    def __repr__(self) -> str:
        clsname = type(self).__name__
        return f"{clsname}(input_size={self.input_size}, hidden_size={self.hidden_size})"
forward(self, x, hidden=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x: torch.Tensor, hidden=None) -> tuple:
    sigm = torch.sigmoid
    tanh = torch.tanh

    if hidden is None:
        h_prev = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
        c_prev = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
    else:
        h_prev, c_prev = hidden

    i_t = sigm(self.W_ii @ x + self.b_ii + self.W_hi @ h_prev + self.b_hi)
    f_t = sigm(self.W_if @ x + self.b_if + self.W_hf @ h_prev + self.b_hf)
    g_t = tanh(self.W_ig @ x + self.b_ig + self.W_hg @ h_prev + self.b_hg)
    o_t = sigm(self.W_io @ x + self.b_io + self.W_ho @ h_prev + self.b_ho)
    c_t = f_t * c_prev + i_t * g_t
    h_t = o_t * tanh(c_t)

    return h_t, (h_t, c_t)

LocomotorNet (Module)

This is a control network which consists of two components: one linear, and one non-linear. The non-linear component is an input-independent set of sinusoidals waves whose amplitudes, frequencies and phases are trainable. Upon execution of a forward pass, the output of the non-linear component is the sum of all these sinusoidal waves. The linear component is a linear layer (optionally with bias) whose weights (and biases) are trainable. The final output of the LocomotorNet at the end of a forward pass is the sum of the linear and the non-linear components.

Note that this is a stateful network, where the only state is the timestep t, which starts from 0 and gets incremented by 1 at the end of each forward pass. The reset() method resets t back to 0.

Reference

Mario Srouji, Jian Zhang, Ruslan Salakhutdinov (2018). Structured Control Nets for Deep Reinforcement Learning.

Source code in evotorch/neuroevolution/net/layers.py
class LocomotorNet(nn.Module):
    """LocomotorNet: A locomotion-specific structured control net.

    This is a control network which consists of two components:
    one linear, and one non-linear. The non-linear component
    is an input-independent set of sinusoidals waves whose
    amplitudes, frequencies and phases are trainable.
    Upon execution of a forward pass, the output of the non-linear
    component is the sum of all these sinusoidal waves.
    The linear component is a linear layer (optionally with bias)
    whose weights (and biases) are trainable.
    The final output of the LocomotorNet at the end of a forward pass
    is the sum of the linear and the non-linear components.

    Note that this is a stateful network, where the only state
    is the timestep t, which starts from 0 and gets incremented by 1
    at the end of each forward pass. The `reset()` method resets
    t back to 0.

    Reference:
        Mario Srouji, Jian Zhang, Ruslan Salakhutdinov (2018).
        Structured Control Nets for Deep Reinforcement Learning.
    """

    def __init__(self, *, in_features: int, out_features: int, bias: bool = True, num_sinusoids=16):
        """`__init__(...)`: Initialize the LocomotorNet.

        Args:
            in_features: Length of the input vector
            out_features: Length of the output vector
            bias: Whether or not the linear component is to have a bias
            num_sinusoids: Number of sinusoidal waves
        """

        nn.Module.__init__(self)

        self._in_features = in_features
        self._out_features = out_features
        self._bias = bias
        self._num_sinusoids = num_sinusoids

        self._linear_component = nn.Linear(
            in_features=self._in_features, out_features=self._out_features, bias=self._bias
        )

        self._amplitudes = nn.ParameterList()
        self._frequencies = nn.ParameterList()
        self._phases = nn.ParameterList()

        for _ in range(self._num_sinusoids):
            for paramlist in (self._amplitudes, self._frequencies, self._phases):
                paramlist.append(nn.Parameter(torch.randn(self._out_features, dtype=torch.float32)))

        self.reset()

    def reset(self):
        """Set the timestep t to 0"""
        self._t = 0

    @property
    def t(self) -> int:
        """The current timestep t"""
        return self._t

    @property
    def in_features(self) -> int:
        """Get the length of the input vector"""
        return self._in_features

    @property
    def out_features(self) -> int:
        """Get the length of the output vector"""
        return self._out_features

    @property
    def num_sinusoids(self) -> int:
        """Get the number of sinusoidal waves of the non-linear component"""
        return self._num_sinusoids

    @property
    def bias(self) -> bool:
        """Get whether or not the linear component has bias"""
        return self._bias

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Execute a forward pass"""
        u_linear = self._linear_component(x)

        t = self._t
        u_nonlinear = torch.zeros(self._out_features)
        for i in range(self._num_sinusoids):
            A = self._amplitudes[i]
            w = self._frequencies[i]
            phi = self._phases[i]
            u_nonlinear = u_nonlinear + (A * torch.sin(w * t + phi))

        self._t += 1

        return u_linear + u_nonlinear
bias: bool property readonly

Get whether or not the linear component has bias

in_features: int property readonly

Get the length of the input vector

num_sinusoids: int property readonly

Get the number of sinusoidal waves of the non-linear component

out_features: int property readonly

Get the length of the output vector

t: int property readonly

The current timestep t

__init__(self, *, in_features, out_features, bias=True, num_sinusoids=16) special

__init__(...): Initialize the LocomotorNet.

Parameters:

Name Type Description Default
in_features int

Length of the input vector

required
out_features int

Length of the output vector

required
bias bool

Whether or not the linear component is to have a bias

True
num_sinusoids

Number of sinusoidal waves

16
Source code in evotorch/neuroevolution/net/layers.py
def __init__(self, *, in_features: int, out_features: int, bias: bool = True, num_sinusoids=16):
    """`__init__(...)`: Initialize the LocomotorNet.

    Args:
        in_features: Length of the input vector
        out_features: Length of the output vector
        bias: Whether or not the linear component is to have a bias
        num_sinusoids: Number of sinusoidal waves
    """

    nn.Module.__init__(self)

    self._in_features = in_features
    self._out_features = out_features
    self._bias = bias
    self._num_sinusoids = num_sinusoids

    self._linear_component = nn.Linear(
        in_features=self._in_features, out_features=self._out_features, bias=self._bias
    )

    self._amplitudes = nn.ParameterList()
    self._frequencies = nn.ParameterList()
    self._phases = nn.ParameterList()

    for _ in range(self._num_sinusoids):
        for paramlist in (self._amplitudes, self._frequencies, self._phases):
            paramlist.append(nn.Parameter(torch.randn(self._out_features, dtype=torch.float32)))

    self.reset()
forward(self, x)

Execute a forward pass

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Execute a forward pass"""
    u_linear = self._linear_component(x)

    t = self._t
    u_nonlinear = torch.zeros(self._out_features)
    for i in range(self._num_sinusoids):
        A = self._amplitudes[i]
        w = self._frequencies[i]
        phi = self._phases[i]
        u_nonlinear = u_nonlinear + (A * torch.sin(w * t + phi))

    self._t += 1

    return u_linear + u_nonlinear
reset(self)

Set the timestep t to 0

Source code in evotorch/neuroevolution/net/layers.py
def reset(self):
    """Set the timestep t to 0"""
    self._t = 0

RNN (Module)

Source code in evotorch/neuroevolution/net/layers.py
class RNN(nn.Module):
    def __init__(
        self,
        input_size: int,
        hidden_size: int,
        nonlinearity: str = "tanh",
        *,
        dtype: torch.dtype = torch.float32,
        device: Union[str, torch.device] = "cpu",
    ):
        super().__init__()

        input_size = int(input_size)
        hidden_size = int(hidden_size)
        nonlinearity = str(nonlinearity)

        self.W1 = nn.Parameter(torch.randn(hidden_size, input_size, dtype=dtype, device=device))
        self.W2 = nn.Parameter(torch.randn(hidden_size, hidden_size, dtype=dtype, device=device))
        self.b1 = nn.Parameter(torch.zeros(hidden_size, dtype=dtype, device=device))
        self.b2 = nn.Parameter(torch.zeros(hidden_size, dtype=dtype, device=device))

        if nonlinearity == "tanh":
            self.actfunc = torch.tanh
        else:
            self.actfunc = getattr(nnf, nonlinearity)

        self.nonlinearity = nonlinearity
        self.input_size = input_size
        self.hidden_size = hidden_size

    def forward(self, x: torch.Tensor, h: Optional[torch.Tensor] = None) -> tuple:
        if h is None:
            h = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
        act = self.actfunc
        W1 = self.W1
        W2 = self.W2
        b1 = self.b1.unsqueeze(-1)
        b2 = self.b2.unsqueeze(-1)
        x = x.unsqueeze(-1)
        h = h.unsqueeze(-1)
        y = act(((W1 @ x) + b1) + ((W2 @ h) + b2))
        y = y.squeeze(-1)
        return y, y

    def __repr__(self) -> str:
        clsname = type(self).__name__
        return f"{clsname}(input_size={self.input_size}, hidden_size={self.hidden_size}, nonlinearity={repr(self.nonlinearity)})"
forward(self, x, h=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x: torch.Tensor, h: Optional[torch.Tensor] = None) -> tuple:
    if h is None:
        h = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
    act = self.actfunc
    W1 = self.W1
    W2 = self.W2
    b1 = self.b1.unsqueeze(-1)
    b2 = self.b2.unsqueeze(-1)
    x = x.unsqueeze(-1)
    h = h.unsqueeze(-1)
    y = act(((W1 @ x) + b1) + ((W2 @ h) + b2))
    y = y.squeeze(-1)
    return y, y

RecurrentNet (Module)

Source code in evotorch/neuroevolution/net/layers.py
class RNN(nn.Module):
    def __init__(
        self,
        input_size: int,
        hidden_size: int,
        nonlinearity: str = "tanh",
        *,
        dtype: torch.dtype = torch.float32,
        device: Union[str, torch.device] = "cpu",
    ):
        super().__init__()

        input_size = int(input_size)
        hidden_size = int(hidden_size)
        nonlinearity = str(nonlinearity)

        self.W1 = nn.Parameter(torch.randn(hidden_size, input_size, dtype=dtype, device=device))
        self.W2 = nn.Parameter(torch.randn(hidden_size, hidden_size, dtype=dtype, device=device))
        self.b1 = nn.Parameter(torch.zeros(hidden_size, dtype=dtype, device=device))
        self.b2 = nn.Parameter(torch.zeros(hidden_size, dtype=dtype, device=device))

        if nonlinearity == "tanh":
            self.actfunc = torch.tanh
        else:
            self.actfunc = getattr(nnf, nonlinearity)

        self.nonlinearity = nonlinearity
        self.input_size = input_size
        self.hidden_size = hidden_size

    def forward(self, x: torch.Tensor, h: Optional[torch.Tensor] = None) -> tuple:
        if h is None:
            h = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
        act = self.actfunc
        W1 = self.W1
        W2 = self.W2
        b1 = self.b1.unsqueeze(-1)
        b2 = self.b2.unsqueeze(-1)
        x = x.unsqueeze(-1)
        h = h.unsqueeze(-1)
        y = act(((W1 @ x) + b1) + ((W2 @ h) + b2))
        y = y.squeeze(-1)
        return y, y

    def __repr__(self) -> str:
        clsname = type(self).__name__
        return f"{clsname}(input_size={self.input_size}, hidden_size={self.hidden_size}, nonlinearity={repr(self.nonlinearity)})"
forward(self, x, h=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x: torch.Tensor, h: Optional[torch.Tensor] = None) -> tuple:
    if h is None:
        h = torch.zeros(self.hidden_size, dtype=x.dtype, device=x.device)
    act = self.actfunc
    W1 = self.W1
    W2 = self.W2
    b1 = self.b1.unsqueeze(-1)
    b2 = self.b2.unsqueeze(-1)
    x = x.unsqueeze(-1)
    h = h.unsqueeze(-1)
    y = act(((W1 @ x) + b1) + ((W2 @ h) + b2))
    y = y.squeeze(-1)
    return y, y

Round (Module)

A small torch module for rounding the values of an input tensor

Source code in evotorch/neuroevolution/net/layers.py
class Round(nn.Module):
    """A small torch module for rounding the values of an input tensor"""

    def __init__(self, ndigits: int = 0):
        nn.Module.__init__(self)
        self._ndigits = int(ndigits)
        self._q = 10.0**self._ndigits

    def forward(self, x):
        x = x * self._q
        x = torch.round(x)
        x = x / self._q
        return x

    def extra_repr(self):
        return "ndigits=" + str(self._ndigits)
extra_repr(self)

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Source code in evotorch/neuroevolution/net/layers.py
def extra_repr(self):
    return "ndigits=" + str(self._ndigits)
forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x):
    x = x * self._q
    x = torch.round(x)
    x = x / self._q
    return x

Slice (Module)

A small torch module for getting the slice of an input tensor

Source code in evotorch/neuroevolution/net/layers.py
class Slice(nn.Module):
    """A small torch module for getting the slice of an input tensor"""

    def __init__(self, from_index: int, to_index: int):
        """`__init__(...)`: Initialize the Slice operator.

        Args:
            from_index: The index from which the slice begins.
            to_index: The exclusive index at which the slice ends.
        """
        nn.Module.__init__(self)
        self._from_index = from_index
        self._to_index = to_index

    def forward(self, x):
        return x[self._from_index : self._to_index]

    def extra_repr(self):
        return "from_index={}, to_index={}".format(self._from_index, self._to_index)
__init__(self, from_index, to_index) special

__init__(...): Initialize the Slice operator.

Parameters:

Name Type Description Default
from_index int

The index from which the slice begins.

required
to_index int

The exclusive index at which the slice ends.

required
Source code in evotorch/neuroevolution/net/layers.py
def __init__(self, from_index: int, to_index: int):
    """`__init__(...)`: Initialize the Slice operator.

    Args:
        from_index: The index from which the slice begins.
        to_index: The exclusive index at which the slice ends.
    """
    nn.Module.__init__(self)
    self._from_index = from_index
    self._to_index = to_index
extra_repr(self)

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Source code in evotorch/neuroevolution/net/layers.py
def extra_repr(self):
    return "from_index={}, to_index={}".format(self._from_index, self._to_index)
forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x):
    return x[self._from_index : self._to_index]

StructuredControlNet (Module)

Structured Control Net.

This is a control network consisting of two components: (i) a non-linear component, which is a feed-forward network; and (ii) a linear component, which is a linear layer. Both components take the input vector provided to the structured control network. The final output is the sum of the outputs of both components.

Reference

Mario Srouji, Jian Zhang, Ruslan Salakhutdinov (2018). Structured Control Nets for Deep Reinforcement Learning.

Source code in evotorch/neuroevolution/net/layers.py
class StructuredControlNet(nn.Module):
    """Structured Control Net.

    This is a control network consisting of two components:
    (i) a non-linear component, which is a feed-forward network; and
    (ii) a linear component, which is a linear layer.
    Both components take the input vector provided to the
    structured control network.
    The final output is the sum of the outputs of both components.

    Reference:
        Mario Srouji, Jian Zhang, Ruslan Salakhutdinov (2018).
        Structured Control Nets for Deep Reinforcement Learning.
    """

    def __init__(
        self,
        *,
        in_features: int,
        out_features: int,
        num_layers: int,
        hidden_size: int,
        bias: bool = True,
        nonlinearity: Union[str, Callable] = "tanh",
    ):
        """`__init__(...)`: Initialize the structured control net.

        Args:
            in_features: Length of the input vector
            out_features: Length of the output vector
            num_layers: Number of hidden layers for the non-linear component
            hidden_size: Number of neurons in a hidden layer of the
                non-linear component
            bias: Whether or not the linear component is to have bias
            nonlinearity: Activation function
        """

        nn.Module.__init__(self)

        self._in_features = in_features
        self._out_features = out_features
        self._num_layers = num_layers
        self._hidden_size = hidden_size
        self._bias = bias
        self._nonlinearity = nonlinearity

        self._linear_component = nn.Linear(
            in_features=self._in_features, out_features=self._out_features, bias=self._bias
        )

        self._nonlinear_component = FeedForwardNet(
            input_size=self._in_features,
            layers=(
                list((self._hidden_size, self._nonlinearity) for _ in range(self._num_layers))
                + [(self._out_features, self._nonlinearity)]
            ),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """TODO: documentation"""
        return self._linear_component(x) + self._nonlinear_component(x)

    @property
    def in_features(self):
        """TODO: documentation"""
        return self._in_features

    @property
    def out_features(self):
        """TODO: documentation"""
        return self._out_features

    @property
    def num_layers(self):
        """TODO: documentation"""
        return self._num_layers

    @property
    def hidden_size(self):
        """TODO: documentation"""
        return self._hidden_size

    @property
    def bias(self):
        """TODO: documentation"""
        return self._bias

    @property
    def nonlinearity(self):
        """TODO: documentation"""
        return self._nonlinearity
bias property readonly
hidden_size property readonly
in_features property readonly
nonlinearity property readonly
num_layers property readonly
out_features property readonly
__init__(self, *, in_features, out_features, num_layers, hidden_size, bias=True, nonlinearity='tanh') special

__init__(...): Initialize the structured control net.

Parameters:

Name Type Description Default
in_features int

Length of the input vector

required
out_features int

Length of the output vector

required
num_layers int

Number of hidden layers for the non-linear component

required
hidden_size int

Number of neurons in a hidden layer of the non-linear component

required
bias bool

Whether or not the linear component is to have bias

True
nonlinearity Union[str, Callable]

Activation function

'tanh'
Source code in evotorch/neuroevolution/net/layers.py
def __init__(
    self,
    *,
    in_features: int,
    out_features: int,
    num_layers: int,
    hidden_size: int,
    bias: bool = True,
    nonlinearity: Union[str, Callable] = "tanh",
):
    """`__init__(...)`: Initialize the structured control net.

    Args:
        in_features: Length of the input vector
        out_features: Length of the output vector
        num_layers: Number of hidden layers for the non-linear component
        hidden_size: Number of neurons in a hidden layer of the
            non-linear component
        bias: Whether or not the linear component is to have bias
        nonlinearity: Activation function
    """

    nn.Module.__init__(self)

    self._in_features = in_features
    self._out_features = out_features
    self._num_layers = num_layers
    self._hidden_size = hidden_size
    self._bias = bias
    self._nonlinearity = nonlinearity

    self._linear_component = nn.Linear(
        in_features=self._in_features, out_features=self._out_features, bias=self._bias
    )

    self._nonlinear_component = FeedForwardNet(
        input_size=self._in_features,
        layers=(
            list((self._hidden_size, self._nonlinearity) for _ in range(self._num_layers))
            + [(self._out_features, self._nonlinearity)]
        ),
    )
forward(self, x)
Source code in evotorch/neuroevolution/net/layers.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """TODO: documentation"""
    return self._linear_component(x) + self._nonlinear_component(x)

misc

Utilities for reading and for writing neural network parameters

count_parameters(net)

Get the number of parameters the network.

Parameters:

Name Type Description Default
net Module

The torch module whose parameters will be counted.

required

Returns:

Type Description
int

The number of parameters, as an integer.

Source code in evotorch/neuroevolution/net/misc.py
def count_parameters(net: nn.Module) -> int:
    """
    Get the number of parameters the network.

    Args:
        net: The torch module whose parameters will be counted.
    Returns:
        The number of parameters, as an integer.
    """

    count = 0

    for p in net.parameters():
        count += p.numel()

    return count

device_of_module(m, default=None)

Get the device in which the module exists.

This function looks at the first parameter of the module, and returns its device. This function is not meant to be used on modules whose parameters exist on different devices.

Parameters:

Name Type Description Default
m Module

The module whose device is being queried.

required
default Union[str, torch.device]

The fallback device to return if the module has no parameters. If this is left as None, the fallback device is assumed to be "cpu".

None

Returns:

Type Description
device

The device of the module, determined from its first parameter.

Source code in evotorch/neuroevolution/net/misc.py
def device_of_module(m: nn.Module, default: Optional[Union[str, torch.device]] = None) -> torch.device:
    """
    Get the device in which the module exists.

    This function looks at the first parameter of the module, and returns
    its device. This function is not meant to be used on modules whose
    parameters exist on different devices.

    Args:
        m: The module whose device is being queried.
        default: The fallback device to return if the module has no
            parameters. If this is left as None, the fallback device
            is assumed to be "cpu".
    Returns:
        The device of the module, determined from its first parameter.
    """
    if default is None:
        default = torch.device("cpu")

    device = default

    for p in m.parameters():
        device = p.device
        break

    return device

fill_parameters(net, vector)

Fill the parameters of a torch module (net) from a vector.

No gradient information is kept.

The vector's length must be exactly the same with the number of parameters of the PyTorch module.

Parameters:

Name Type Description Default
net Module

The torch module whose parameter values will be filled.

required
vector Tensor

A 1-D torch tensor which stores the parameter values.

required
Source code in evotorch/neuroevolution/net/misc.py
@torch.no_grad()
def fill_parameters(net: nn.Module, vector: torch.Tensor):
    """Fill the parameters of a torch module (net) from a vector.

    No gradient information is kept.

    The vector's length must be exactly the same with the number
    of parameters of the PyTorch module.

    Args:
        net: The torch module whose parameter values will be filled.
        vector: A 1-D torch tensor which stores the parameter values.
    """
    address = 0
    for p in net.parameters():
        d = p.data.view(-1)
        n = len(d)
        d[:] = torch.as_tensor(vector[address : address + n], device=d.device)
        address += n

    if address != len(vector):
        raise IndexError("The parameter vector is larger than expected")

parameter_vector(net, *, device=None)

Get all the parameters of a torch module (net) into a vector

No gradient information is kept.

Parameters:

Name Type Description Default
net Module

The torch module whose parameters will be extracted.

required
device Union[str, torch.device]

The device in which the parameter vector will be constructed. If the network has parameter across multiple devices, you can specify this argument so that concatenation of all the parameters will be successful.

None

Returns:

Type Description
Tensor

The parameters of the module in a 1-D tensor.

Source code in evotorch/neuroevolution/net/misc.py
@torch.no_grad()
def parameter_vector(net: nn.Module, *, device: Optional[Device] = None) -> torch.Tensor:
    """Get all the parameters of a torch module (net) into a vector

    No gradient information is kept.

    Args:
        net: The torch module whose parameters will be extracted.
        device: The device in which the parameter vector will be constructed.
            If the network has parameter across multiple devices,
            you can specify this argument so that concatenation of all the
            parameters will be successful.
    Returns:
        The parameters of the module in a 1-D tensor.
    """
    dev_kwarg = {} if device is None else {"device": device}

    all_vectors = []
    for p in net.parameters():
        all_vectors.append(torch.as_tensor(p.data.view(-1), **dev_kwarg))

    return torch.cat(all_vectors)

multilayered

MultiLayered (Module)

Source code in evotorch/neuroevolution/net/multilayered.py
class MultiLayered(nn.Module):
    def __init__(self, *layers: nn.Module):
        super().__init__()
        self._submodules = nn.ModuleList(layers)

    def forward(self, x: torch.Tensor, h: Optional[dict] = None):
        if h is None:
            h = {}

        new_h = {}

        for i, layer in enumerate(self._submodules):
            layer_h = h.get(i, None)
            if layer_h is None:
                layer_result = layer(x)
            else:
                layer_result = layer(x, h[i])

            if isinstance(layer_result, tuple):
                if len(layer_result) == 2:
                    x, layer_new_h = layer_result
                else:
                    raise ValueError(
                        f"The layer number {i} returned a tuple of length {len(layer_result)}."
                        f" A tensor or a tuple of two elements was expected."
                    )
            elif isinstance(layer_result, torch.Tensor):
                x = layer_result
                layer_new_h = None
            else:
                raise TypeError(
                    f"The layer number {i} returned an object of type {type(layer_result)}."
                    f" A tensor or a tuple of two elements was expected."
                )

            if layer_new_h is not None:
                new_h[i] = layer_new_h

        if len(new_h) == 0:
            return x
        else:
            return x, new_h

    def __iter__(self):
        return self._submodules.__iter__()

    def __getitem__(self, i):
        return self._submodules[i]

    def __len__(self):
        return len(self._submodules)

    def append(self, module: nn.Module):
        self._submodules.append(module)
forward(self, x, h=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/multilayered.py
def forward(self, x: torch.Tensor, h: Optional[dict] = None):
    if h is None:
        h = {}

    new_h = {}

    for i, layer in enumerate(self._submodules):
        layer_h = h.get(i, None)
        if layer_h is None:
            layer_result = layer(x)
        else:
            layer_result = layer(x, h[i])

        if isinstance(layer_result, tuple):
            if len(layer_result) == 2:
                x, layer_new_h = layer_result
            else:
                raise ValueError(
                    f"The layer number {i} returned a tuple of length {len(layer_result)}."
                    f" A tensor or a tuple of two elements was expected."
                )
        elif isinstance(layer_result, torch.Tensor):
            x = layer_result
            layer_new_h = None
        else:
            raise TypeError(
                f"The layer number {i} returned an object of type {type(layer_result)}."
                f" A tensor or a tuple of two elements was expected."
            )

        if layer_new_h is not None:
            new_h[i] = layer_new_h

    if len(new_h) == 0:
        return x
    else:
        return x, new_h

parser

Utilities for parsing string representations of neural net policies

NetParsingError (Exception)

Representation of a parsing error

Source code in evotorch/neuroevolution/net/parser.py
class NetParsingError(Exception):
    """
    Representation of a parsing error
    """

    def __init__(
        self,
        message: str,
        lineno: Optional[int] = None,
        col_offset: Optional[int] = None,
        original_error: Optional[Exception] = None,
    ):
        """
        `__init__(...)`: Initialize the NetParsingError.

        Args:
            message: Error message, as string.
            lineno: Erroneous line number in the string representation of the
                neural network structure.
            col_offset: Erroneous column number in the string representation
                of the neural network structure.
            original_error: If another error caused this parsing error,
                that original error can be attached to this `NetParsingError`
                instance via this argument.
        """
        super().__init__()
        self.message = message
        self.lineno = lineno
        self.col_offset = col_offset
        self.original_error = original_error

    def _to_string(self) -> str:
        parts = []

        parts.append(type(self).__name__)

        if self.lineno is not None:
            parts.append(" at line(")
            parts.append(str(self.lineno - 1))
            parts.append(")")

        if self.col_offset is not None:
            parts.append(" at column(")
            parts.append(str(self.col_offset + 1))
            parts.append(")")

        parts.append(": ")
        parts.append(self.message)

        return "".join(parts)

    def __str__(self) -> str:
        return self._to_string()

    def __repr__(self) -> str:
        return self._to_string()
__init__(self, message, lineno=None, col_offset=None, original_error=None) special

__init__(...): Initialize the NetParsingError.

Parameters:

Name Type Description Default
message str

Error message, as string.

required
lineno Optional[int]

Erroneous line number in the string representation of the neural network structure.

None
col_offset Optional[int]

Erroneous column number in the string representation of the neural network structure.

None
original_error Optional[Exception]

If another error caused this parsing error, that original error can be attached to this NetParsingError instance via this argument.

None
Source code in evotorch/neuroevolution/net/parser.py
def __init__(
    self,
    message: str,
    lineno: Optional[int] = None,
    col_offset: Optional[int] = None,
    original_error: Optional[Exception] = None,
):
    """
    `__init__(...)`: Initialize the NetParsingError.

    Args:
        message: Error message, as string.
        lineno: Erroneous line number in the string representation of the
            neural network structure.
        col_offset: Erroneous column number in the string representation
            of the neural network structure.
        original_error: If another error caused this parsing error,
            that original error can be attached to this `NetParsingError`
            instance via this argument.
    """
    super().__init__()
    self.message = message
    self.lineno = lineno
    self.col_offset = col_offset
    self.original_error = original_error

str_to_net(s, **constants)

Read a string representation of a neural net structure, and return a torch.nn.Module instance out of it.

Let us imagine that one wants to describe the following neural network structure:

from torch import nn
from evotorch.neuroevolution.net import MultiLayered

net = MultiLayered(nn.Linear(8, 16), nn.Tanh(), nn.Linear(16, 4, bias=False), nn.ReLU())

By using str_to_net(...) one can construct an equivalent module via:

from evotorch.neuroevolution.net import str_to_net

net = str_to_net("Linear(8, 16) >> Tanh() >> Linear(16, 4, bias=False) >> ReLU()")

The string can also be multi-line:

net = str_to_net(
    '''
    Linear(8, 16)
    >> Tanh()
    >> Linear(16, 4, bias=False)
    >> ReLU()
    '''
)

One can also define constants for using them in strings:

net = str_to_net(
    '''
    Linear(input_size, hidden_size)
    >> Tanh()
    >> Linear(hidden_size, output_size, bias=False)
    >> ReLU()
    ''',
    input_size=8,
    hidden_size=16,
    output_size=4,
)

In the neural net structure string, when one refers to a module type, say, Linear, first the name Linear is searched for in the namespace evotorch.neuroevolution.net.layers, and then in the namespace torch.nn. In the case of Linear, the searched name exists in torch.nn, and therefore, the layer type to be instantiated is accepted as torch.nn.Linear. Instead of Linear, if one had used the name, say, StructuredControlNet, then, the layer type to be instantiated would be evotorch.neuroevolution.net.layers.StructuredControlNet.

The namespace evotorch.neuroevolution.net.layers contains its own implementations for RNN and LSTM. These recurrent layer implementations work similarly to their counterparts torch.nn.RNN and torch.nn.LSTM, except that EvoTorch's implementations do not expect the data with extra leftmost dimensions for batching and for timesteps. Instead, they expect to receive a single input and a single current hidden state, and produce a single output and a single new hidden state. These recurrent layer implementations of EvoTorch can be used within a neural net structure string. Therefore, the following examples are valid:

rnn1 = str_to_net("RNN(4, 8) >> Linear(8, 2)")

rnn2 = str_to_net(
    '''
    Linear(4, 10)
    >> Tanh()
    >> RNN(input_size=10, hidden_size=24, nonlinearity='tanh'
    >> Linear(24, 2)
    '''
)

lstm1 = str_to_net("LSTM(4, 32) >> Linear(32, 2)")

lstm2 = str_to_net("LSTM(input_size=4, hidden_size=32) >> Linear(32, 2)")

Notes regarding usage with evotorch.neuroevolution.GymNE or with evotorch.neuroevolution.VecGymNE:

While instantiating a GymNE or a VecGymNE, one can specify a neural net structure string as the policy. Therefore, while filling the policy string for a GymNE, all these rules mentioned above apply. Additionally, while using str_to_net(...) internally, GymNE and VecGymNE define these extra constants: obs_length (length of the observation vector), act_length (length of the action vector for continuous-action environments, or number of actions for discrete-action environments), and obs_shape (shape of the observation as a tuple, assuming that the observation space is of type gym.spaces.Box, usable within the string like obs_shape[0], obs_shape[1], etc., or simply obs_shape to refer to the entire tuple).

Therefore, while instantiating a GymNE or a VecGymNE, one can define a single-hidden-layered policy via this string:

"Linear(obs_length, 16) >> Tanh() >> Linear(16, act_length) >> Tanh()"

In the policy string above, one might choose to omit the last Tanh(), as GymNE and VecGymNE will clip the final output of the policy to conform to the action boundaries defined by the target reinforcement learning environment, and such a clipping operation might be seen as using an activation function similar to hard-tanh anyway.

Parameters:

Name Type Description Default
s str

The string which expresses the neural net structure.

required

Returns:

Type Description
Module

The PyTorch module of the specified structure.

Source code in evotorch/neuroevolution/net/parser.py
def str_to_net(s: str, **constants) -> nn.Module:
    """
    Read a string representation of a neural net structure,
    and return a `torch.nn.Module` instance out of it.

    Let us imagine that one wants to describe the following
    neural network structure:

    ```python
    from torch import nn
    from evotorch.neuroevolution.net import MultiLayered

    net = MultiLayered(nn.Linear(8, 16), nn.Tanh(), nn.Linear(16, 4, bias=False), nn.ReLU())
    ```

    By using `str_to_net(...)` one can construct an equivalent
    module via:

    ```python
    from evotorch.neuroevolution.net import str_to_net

    net = str_to_net("Linear(8, 16) >> Tanh() >> Linear(16, 4, bias=False) >> ReLU()")
    ```

    The string can also be multi-line:

    ```python
    net = str_to_net(
        '''
        Linear(8, 16)
        >> Tanh()
        >> Linear(16, 4, bias=False)
        >> ReLU()
        '''
    )
    ```

    One can also define constants for using them in strings:

    ```python
    net = str_to_net(
        '''
        Linear(input_size, hidden_size)
        >> Tanh()
        >> Linear(hidden_size, output_size, bias=False)
        >> ReLU()
        ''',
        input_size=8,
        hidden_size=16,
        output_size=4,
    )
    ```

    In the neural net structure string, when one refers to a module type,
    say, `Linear`, first the name `Linear` is searched for in the namespace
    `evotorch.neuroevolution.net.layers`, and then in the namespace `torch.nn`.
    In the case of `Linear`, the searched name exists in `torch.nn`,
    and therefore, the layer type to be instantiated is accepted as
    `torch.nn.Linear`.
    Instead of `Linear`, if one had used the name, say,
    `StructuredControlNet`, then, the layer type to be instantiated
    would be `evotorch.neuroevolution.net.layers.StructuredControlNet`.

    The namespace `evotorch.neuroevolution.net.layers` contains its own
    implementations for RNN and LSTM. These recurrent layer implementations
    work similarly to their counterparts `torch.nn.RNN` and `torch.nn.LSTM`,
    except that EvoTorch's implementations do not expect the data with extra
    leftmost dimensions for batching and for timesteps. Instead, they expect
    to receive a single input and a single current hidden state, and produce
    a single output and a single new hidden state. These recurrent layer
    implementations of EvoTorch can be used within a neural net structure
    string. Therefore, the following examples are valid:

    ```python
    rnn1 = str_to_net("RNN(4, 8) >> Linear(8, 2)")

    rnn2 = str_to_net(
        '''
        Linear(4, 10)
        >> Tanh()
        >> RNN(input_size=10, hidden_size=24, nonlinearity='tanh'
        >> Linear(24, 2)
        '''
    )

    lstm1 = str_to_net("LSTM(4, 32) >> Linear(32, 2)")

    lstm2 = str_to_net("LSTM(input_size=4, hidden_size=32) >> Linear(32, 2)")
    ```

    **Notes regarding usage with `evotorch.neuroevolution.GymNE`
    or with `evotorch.neuroevolution.VecGymNE`:**

    While instantiating a `GymNE` or a `VecGymNE`, one can specify a neural
    net structure string as the policy. Therefore, while filling the policy
    string for a `GymNE`, all these rules mentioned above apply. Additionally,
    while using `str_to_net(...)` internally, `GymNE` and `VecGymNE` define
    these extra constants:
    `obs_length` (length of the observation vector),
    `act_length` (length of the action vector for continuous-action
    environments, or number of actions for discrete-action
    environments), and
    `obs_shape` (shape of the observation as a tuple, assuming that the
    observation space is of type `gym.spaces.Box`, usable within the string
    like `obs_shape[0]`, `obs_shape[1]`, etc., or simply `obs_shape` to refer
    to the entire tuple).

    Therefore, while instantiating a `GymNE` or a `VecGymNE`, one can define a
    single-hidden-layered policy via this string:

    ```
    "Linear(obs_length, 16) >> Tanh() >> Linear(16, act_length) >> Tanh()"
    ```

    In the policy string above, one might choose to omit the last `Tanh()`, as
    `GymNE` and `VecGymNE` will clip the final output of the policy to conform
    to the action boundaries defined by the target reinforcement learning
    environment, and such a clipping operation might be seen as using an
    activation function similar to hard-tanh anyway.

    Args:
        s: The string which expresses the neural net structure.
    Returns:
        The PyTorch module of the specified structure.
    """
    s = f"(\n{s}\n)"
    return _process_expr(ast.parse(s, mode="eval").body, constants=constants)

rl

This namespace provides various reinforcement learning utilities.

ActClipWrapperModule (Module)

Source code in evotorch/neuroevolution/net/rl.py
class ActClipWrapperModule(nn.Module):
    def __init__(self, wrapped_module: nn.Module, obs_space: Box):
        super().__init__()

        device = device_of_module(wrapped_module)

        if not isinstance(obs_space, Box):
            raise TypeError(f"Unrecognized observation space: {obs_space}")

        self.wrapped_module = wrapped_module
        self.register_buffer("_low", torch.from_numpy(obs_space.low).to(device))
        self.register_buffer("_high", torch.from_numpy(obs_space.high).to(device))

    def forward(self, x: torch.Tensor, h: Any = None) -> Union[torch.Tensor, tuple]:
        if h is None:
            result = self.wrapped_module(x)
        else:
            result = self.wrapped_module(x, h)

        if isinstance(result, tuple):
            x, h = result
            got_h = True
        else:
            x = result
            h = None
            got_h = False

        x = torch.max(x, self._low)
        x = torch.min(x, self._high)

        if got_h:
            return x, h
        else:
            return x
forward(self, x, h=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/rl.py
def forward(self, x: torch.Tensor, h: Any = None) -> Union[torch.Tensor, tuple]:
    if h is None:
        result = self.wrapped_module(x)
    else:
        result = self.wrapped_module(x, h)

    if isinstance(result, tuple):
        x, h = result
        got_h = True
    else:
        x = result
        h = None
        got_h = False

    x = torch.max(x, self._low)
    x = torch.min(x, self._high)

    if got_h:
        return x, h
    else:
        return x

AliveBonusScheduleWrapper (Wrapper)

A Wrapper which awards the agent for being alive in a scheduled manner This wrapper is meant to be used for non-vectorized environments.

Source code in evotorch/neuroevolution/net/rl.py
class AliveBonusScheduleWrapper(gym.Wrapper):
    """
    A Wrapper which awards the agent for being alive in a scheduled manner
    This wrapper is meant to be used for non-vectorized environments.
    """

    def __init__(self, env: gym.Env, alive_bonus_schedule: tuple, **kwargs):
        """
        `__init__(...)`: Initialize the AliveBonusScheduleWrapper.

        Args:
            env: Environment to wrap.
            alive_bonus_schedule: If given as a tuple `(t, b)`, an alive
                bonus `b` will be added onto all the rewards beyond the
                timestep `t`.
                If given as a tuple `(t0, t1, b)`, a partial (linearly
                increasing towards `b`) alive bonus will be added onto
                all the rewards between the timesteps `t0` and `t1`,
                and a full alive bonus (which equals to `b`) will be added
                onto all the rewards beyond the timestep `t1`.
            kwargs: Expected in the form of additional keyword arguments,
                these will be passed to the initialization method of the
                superclass.
        """
        super().__init__(env, **kwargs)
        self.__t: Optional[int] = None

        if len(alive_bonus_schedule) == 3:
            self.__t0, self.__t1, self.__bonus = (
                int(alive_bonus_schedule[0]),
                int(alive_bonus_schedule[1]),
                float(alive_bonus_schedule[2]),
            )
        elif len(alive_bonus_schedule) == 2:
            self.__t0, self.__t1, self.__bonus = (
                int(alive_bonus_schedule[0]),
                int(alive_bonus_schedule[0]),
                float(alive_bonus_schedule[1]),
            )
        else:
            raise ValueError(
                f"The argument `alive_bonus_schedule` was expected to have 2 or 3 elements."
                f" However, its value is {repr(alive_bonus_schedule)} (having {len(alive_bonus_schedule)} elements)."
            )

        if self.__t1 > self.__t0:
            self.__gap = self.__t1 - self.__t0
        else:
            self.__gap = None

    def reset(self, *args, **kwargs):
        self.__t = 0
        return self.env.reset(*args, **kwargs)

    def step(self, action) -> tuple:
        step_result = self.env.step(action)
        self.__t += 1

        observation = step_result[0]
        reward = step_result[1]
        rest = step_result[2:]

        if self.__t >= self.__t1:
            reward = reward + self.__bonus
        elif (self.__gap is not None) and (self.__t >= self.__t0):
            reward = reward + ((self.__t - self.__t0) / self.__gap) * self.__bonus

        return (observation, reward) + rest
__init__(self, env, alive_bonus_schedule, **kwargs) special

__init__(...): Initialize the AliveBonusScheduleWrapper.

Parameters:

Name Type Description Default
env Env

Environment to wrap.

required
alive_bonus_schedule tuple

If given as a tuple (t, b), an alive bonus b will be added onto all the rewards beyond the timestep t. If given as a tuple (t0, t1, b), a partial (linearly increasing towards b) alive bonus will be added onto all the rewards between the timesteps t0 and t1, and a full alive bonus (which equals to b) will be added onto all the rewards beyond the timestep t1.

required
kwargs

Expected in the form of additional keyword arguments, these will be passed to the initialization method of the superclass.

{}
Source code in evotorch/neuroevolution/net/rl.py
def __init__(self, env: gym.Env, alive_bonus_schedule: tuple, **kwargs):
    """
    `__init__(...)`: Initialize the AliveBonusScheduleWrapper.

    Args:
        env: Environment to wrap.
        alive_bonus_schedule: If given as a tuple `(t, b)`, an alive
            bonus `b` will be added onto all the rewards beyond the
            timestep `t`.
            If given as a tuple `(t0, t1, b)`, a partial (linearly
            increasing towards `b`) alive bonus will be added onto
            all the rewards between the timesteps `t0` and `t1`,
            and a full alive bonus (which equals to `b`) will be added
            onto all the rewards beyond the timestep `t1`.
        kwargs: Expected in the form of additional keyword arguments,
            these will be passed to the initialization method of the
            superclass.
    """
    super().__init__(env, **kwargs)
    self.__t: Optional[int] = None

    if len(alive_bonus_schedule) == 3:
        self.__t0, self.__t1, self.__bonus = (
            int(alive_bonus_schedule[0]),
            int(alive_bonus_schedule[1]),
            float(alive_bonus_schedule[2]),
        )
    elif len(alive_bonus_schedule) == 2:
        self.__t0, self.__t1, self.__bonus = (
            int(alive_bonus_schedule[0]),
            int(alive_bonus_schedule[0]),
            float(alive_bonus_schedule[1]),
        )
    else:
        raise ValueError(
            f"The argument `alive_bonus_schedule` was expected to have 2 or 3 elements."
            f" However, its value is {repr(alive_bonus_schedule)} (having {len(alive_bonus_schedule)} elements)."
        )

    if self.__t1 > self.__t0:
        self.__gap = self.__t1 - self.__t0
    else:
        self.__gap = None
reset(self, *args, **kwargs)

Uses the :meth:reset of the :attr:env that can be overwritten to change the returned data.

Source code in evotorch/neuroevolution/net/rl.py
def reset(self, *args, **kwargs):
    self.__t = 0
    return self.env.reset(*args, **kwargs)
step(self, action)

Uses the :meth:step of the :attr:env that can be overwritten to change the returned data.

Source code in evotorch/neuroevolution/net/rl.py
def step(self, action) -> tuple:
    step_result = self.env.step(action)
    self.__t += 1

    observation = step_result[0]
    reward = step_result[1]
    rest = step_result[2:]

    if self.__t >= self.__t1:
        reward = reward + self.__bonus
    elif (self.__gap is not None) and (self.__t >= self.__t0):
        reward = reward + ((self.__t - self.__t0) / self.__gap) * self.__bonus

    return (observation, reward) + rest

ObsNormWrapperModule (Module)

Source code in evotorch/neuroevolution/net/rl.py
class ObsNormWrapperModule(nn.Module):
    def __init__(self, wrapped_module: nn.Module, rn: Union[RunningStat, RunningNorm]):
        super().__init__()

        device = device_of_module(wrapped_module)
        self.wrapped_module = wrapped_module

        with torch.no_grad():
            normalizer = deepcopy(rn.to_layer()).to(device)
        self.normalizer = normalizer

    def forward(self, x: torch.Tensor, h: Any = None) -> Union[torch.Tensor, tuple]:
        x = self.normalizer(x)

        if h is None:
            result = self.wrapped_module(x)
        else:
            result = self.wrapped_module(x, h)

        if isinstance(result, tuple):
            x, h = result
            got_h = True
        else:
            x = result
            h = None
            got_h = False

        if got_h:
            return x, h
        else:
            return x
forward(self, x, h=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/rl.py
def forward(self, x: torch.Tensor, h: Any = None) -> Union[torch.Tensor, tuple]:
    x = self.normalizer(x)

    if h is None:
        result = self.wrapped_module(x)
    else:
        result = self.wrapped_module(x, h)

    if isinstance(result, tuple):
        x, h = result
        got_h = True
    else:
        x = result
        h = None
        got_h = False

    if got_h:
        return x, h
    else:
        return x

reset_env(env)

Reset a gymnasium environment.

Even though the gymnasium library switched to a new API where the reset() method returns a tuple (observation, info), this function follows the conventions of the classical gym library and returns only the observation of the newly reset environment.

Parameters:

Name Type Description Default
env Env

The gymnasium environment which will be reset.

required

Returns:

Type Description
Iterable

The initial observation

Source code in evotorch/neuroevolution/net/rl.py
def reset_env(env: gym.Env) -> Iterable:
    """
    Reset a gymnasium environment.

    Even though the `gymnasium` library switched to a new API where the
    `reset()` method returns a tuple `(observation, info)`, this function
    follows the conventions of the classical `gym` library and returns
    only the observation of the newly reset environment.

    Args:
        env: The gymnasium environment which will be reset.
    Returns:
        The initial observation
    """
    result = env.reset()
    if isinstance(result, tuple) and (len(result) == 2):
        result = result[0]
    return result

take_step_in_env(env, action)

Take a step in the gymnasium environment. Taking a step means performing the action provided via the arguments.

Even though the gymnasium library switched to a new API where the step() method returns a 5-element tuple of the form (observation, reward, terminated, truncated, info), this function follows the conventions of the classical gym library and returns a 4-element tuple (observation, reward, done, info).

Parameters:

Name Type Description Default
env Env

The gymnasium environment in which the action will be performed.

required
action Iterable

The action to be performed.

required

Returns:

Type Description
tuple

A tuple in the form (observation, reward, done, info) where observation is the observation received after performing the action, reward is the amount of reward gained, done is a boolean value indicating whether or not the episode has ended, and info is additional information (usually as a dictionary).

Source code in evotorch/neuroevolution/net/rl.py
def take_step_in_env(env: gym.Env, action: Iterable) -> tuple:
    """
    Take a step in the gymnasium environment.
    Taking a step means performing the action provided via the arguments.

    Even though the `gymnasium` library switched to a new API where the
    `step()` method returns a 5-element tuple of the form
    `(observation, reward, terminated, truncated, info)`, this function
    follows the conventions of the classical `gym` library and returns
    a 4-element tuple `(observation, reward, done, info)`.

    Args:
        env: The gymnasium environment in which the action will be performed.
        action: The action to be performed.
    Returns:
        A tuple in the form `(observation, reward, done, info)` where
        `observation` is the observation received after performing the action,
        `reward` is the amount of reward gained,
        `done` is a boolean value indicating whether or not the episode has
        ended, and
        `info` is additional information (usually as a dictionary).
    """
    result = env.step(action)
    if isinstance(result, tuple):
        n = len(result)
        if n == 4:
            observation, reward, done, info = result
        elif n == 5:
            observation, reward, terminated, truncated, info = result
            done = terminated or truncated
        else:
            raise ValueError(
                f"The result of the `step(...)` method of the gym environment"
                f" was expected as a tuple of length 4 or 5."
                f" However, the received result is {repr(result)}, which is"
                f" of length {len(result)}."
            )
    else:
        raise TypeError(
            f"The result of the `step(...)` method of the gym environment"
            f" was expected as a tuple of length 4 or 5."
            f" However, the received result is {repr(result)}, which is"
            f" of type {type(result)}."
        )
    return observation, reward, done, info

runningnorm

CollectedStats (tuple)

CollectedStats(mean, stdev)

__getnewargs__(self) special

Return self as a plain tuple. Used by copy and pickle.

Source code in evotorch/neuroevolution/net/runningnorm.py
def __getnewargs__(self):
    'Return self as a plain tuple.  Used by copy and pickle.'
    return _tuple(self)
__new__(_cls, mean, stdev) special staticmethod

Create new instance of CollectedStats(mean, stdev)

__repr__(self) special

Return a nicely formatted representation string

Source code in evotorch/neuroevolution/net/runningnorm.py
def __repr__(self):
    'Return a nicely formatted representation string'
    return self.__class__.__name__ + repr_fmt % self

ObsNormLayer (Module)

An observation normalizer which behaves as a PyTorch Module.

Source code in evotorch/neuroevolution/net/runningnorm.py
class ObsNormLayer(nn.Module):
    """
    An observation normalizer which behaves as a PyTorch Module.
    """

    def __init__(
        self, mean: torch.Tensor, stdev: torch.Tensor, low: Optional[float] = None, high: Optional[float] = None
    ) -> None:
        """
        `__init__(...)`: Initialize the ObsNormLayer.

        Args:
            mean: The mean according to which the observations are to be
                normalized.
            stdev: The standard deviation according to which the observations
                are to be normalized.
            low: Optionally a real number if the result of the normalization
                is to be clipped. Represents the lower bound for the clipping
                operation.
            high: Optionally a real number if the result of the normalization
                is to be clipped. Represents the upper bound for the clipping
                operation.
        """
        super().__init__()
        self.register_buffer("_mean", mean)
        self.register_buffer("_stdev", stdev)
        self._lb = None if low is None else float(low)
        self._ub = None if high is None else float(high)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Normalize an observation or a batch of observations.

        Args:
            x: The observation(s).
        Returns:
            The normalized counterpart of the observation(s).
        """
        return _clamp((x - self._mean) / self._stdev, self._lb, self._ub)
__init__(self, mean, stdev, low=None, high=None) special

__init__(...): Initialize the ObsNormLayer.

Parameters:

Name Type Description Default
mean Tensor

The mean according to which the observations are to be normalized.

required
stdev Tensor

The standard deviation according to which the observations are to be normalized.

required
low Optional[float]

Optionally a real number if the result of the normalization is to be clipped. Represents the lower bound for the clipping operation.

None
high Optional[float]

Optionally a real number if the result of the normalization is to be clipped. Represents the upper bound for the clipping operation.

None
Source code in evotorch/neuroevolution/net/runningnorm.py
def __init__(
    self, mean: torch.Tensor, stdev: torch.Tensor, low: Optional[float] = None, high: Optional[float] = None
) -> None:
    """
    `__init__(...)`: Initialize the ObsNormLayer.

    Args:
        mean: The mean according to which the observations are to be
            normalized.
        stdev: The standard deviation according to which the observations
            are to be normalized.
        low: Optionally a real number if the result of the normalization
            is to be clipped. Represents the lower bound for the clipping
            operation.
        high: Optionally a real number if the result of the normalization
            is to be clipped. Represents the upper bound for the clipping
            operation.
    """
    super().__init__()
    self.register_buffer("_mean", mean)
    self.register_buffer("_stdev", stdev)
    self._lb = None if low is None else float(low)
    self._ub = None if high is None else float(high)
forward(self, x)

Normalize an observation or a batch of observations.

Parameters:

Name Type Description Default
x Tensor

The observation(s).

required

Returns:

Type Description
Tensor

The normalized counterpart of the observation(s).

Source code in evotorch/neuroevolution/net/runningnorm.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """
    Normalize an observation or a batch of observations.

    Args:
        x: The observation(s).
    Returns:
        The normalized counterpart of the observation(s).
    """
    return _clamp((x - self._mean) / self._stdev, self._lb, self._ub)

RunningNorm

An online observation normalization tool

Source code in evotorch/neuroevolution/net/runningnorm.py
class RunningNorm:
    """
    An online observation normalization tool
    """

    def __init__(
        self,
        *,
        shape: Union[tuple, int],
        dtype: DType,
        device: Optional[Device] = None,
        min_variance: float = 1e-2,
        clip: Optional[tuple] = None,
    ) -> None:
        """
        `__init__(...)`: Initialize the RunningNorm

        Args:
            shape: Observation shape. Can be an integer or a tuple.
            dtype: The dtype of the observations.
            device: The device in which the observation stats are held.
                If left as None, the device is assumed to be "cpu".
            min_variance: A lower bound for the variance to be used in
                the normalization computations.
                In other words, if the computed variance according to the
                collected observations ends up lower than `min_variance`,
                this `min_variance` will be used instead (in an elementwise
                manner) while computing the normalized observations.
                As in Salimans et al. (2017), the default is 1e-2.
            clip: Can be left as None (which is the default), or can be
                given as a pair of real numbers.
                This is used for clipping the observations after the
                normalization operation.
                In Salimans et al. (2017), (-5.0, +5.0) was used.
        """

        # Make sure that the shape is stored as a torch.Size object.
        if isinstance(shape, Iterable):
            self._shape = torch.Size(shape)
        else:
            self._shape = torch.Size([int(shape)])

        # Store the number of dimensions
        self._ndim = len(self._shape)

        # Store the dtype and the device
        self._dtype = to_torch_dtype(dtype)
        self._device = "cpu" if device is None else device

        # Initialize the internally stored data as empty
        self._sum: Optional[torch.Tensor] = None
        self._sum_of_squares: Optional[torch.Tensor] = None
        self._count: int = 0

        # Store the minimum variance
        self._min_variance = float(min_variance)

        if clip is not None:
            # If a clip tuple was provided, store the specified lower and upper bounds
            lb, ub = clip
            self._lb = float(lb)
            self._ub = float(ub)
        else:
            # If a clip tuple was not provided the bounds are stored as None
            self._lb = None
            self._ub = None

    def to(self, device: Device) -> "RunningNorm":
        """
        If the target device is a different device, then make a copy of this
        RunningNorm instance on the target device.
        If the target device is the same with this RunningNorm's device, then
        return this RunningNorm itself.

        Args:
            device: The target device.
        Returns:
            The RunningNorm on the target device. This can be a copy, or the
            original RunningNorm instance itself.
        """
        if torch.device(device) == torch.device(self.device):
            return self
        else:
            new_running_norm = object.__new__(type(self))

            already_handled = {"_sum", "_sum_of_squares", "_device"}
            new_running_norm._sum = self._sum.to(device)
            new_running_norm._sum_of_squares = self._sum_of_squares.to(device)
            new_running_norm._device = device

            for k, v in self.__dict__.items():
                if k not in already_handled:
                    setattr(new_running_norm, k, deepcopy(v))

            return new_running_norm

    @property
    def device(self) -> Device:
        """
        The device in which the observation stats are held
        """
        return self._device

    @property
    def dtype(self) -> DType:
        """
        The dtype of the stored observation stats
        """
        return self._dtype

    @property
    def shape(self) -> tuple:
        """
        Observation shape
        """
        return self._shape

    @property
    def min_variance(self) -> float:
        """
        Minimum variance
        """
        return self._min_variance

    @property
    def low(self) -> Optional[float]:
        """
        The lower component of the bounds given in the `clip` tuple.
        If `clip` was initialized as None, this is also None.
        """
        return self._lb

    @property
    def high(self) -> Optional[float]:
        """
        The higher (upper) component of the bounds given in the `clip` tuple.
        If `clip` was initialized as None, this is also None.
        """
        return self._ub

    def _like_its_own(self, x: Iterable) -> torch.Tensor:
        return torch.as_tensor(x, dtype=self._dtype, device=self._device)

    def _verify(self, x: Iterable) -> torch.Tensor:
        x = self._like_its_own(x)
        if x.ndim == self._ndim:
            if x.shape != self._shape:
                raise ValueError(
                    f"This RunningNorm instance was initialized with shape: {self._shape}."
                    f" However, the provided tensor has an incompatible shape: {x._shape}."
                )
        elif x.ndim == (self._ndim + 1):
            if x.shape[1:] != self._shape:
                raise ValueError(
                    f"This RunningNorm instance was initialized with shape: {self._shape}."
                    f" The provided tensor is shaped {x.shape}."
                    f" Accepting the tensor's leftmost dimension as the batch size,"
                    f" the remaining shape is incompatible: {x.shape[1:]}"
                )
        else:
            raise ValueError(
                f"This RunningNorm instance was initialized with shape: {self._shape}."
                f" The provided tensor is shaped {x.shape}."
                f" The number of dimensions of the given tensor is incompatible."
            )
        return x

    def _has_no_data(self) -> bool:
        return (self._sum is None) and (self._sum_of_squares is None) and (self._count == 0)

    def _has_data(self) -> bool:
        return (self._sum is not None) and (self._sum_of_squares is not None) and (self._count > 0)

    def reset(self):
        """
        Remove all the collected observation data.
        """
        self._sum = None
        self._sum_of_squares = None
        self._count = 0

    @torch.no_grad()
    def update(self, x: Union[Iterable, "RunningNorm"], mask: Optional[Iterable] = None, *, verify: bool = True):
        """
        Update the stored stats with new observation data.

        Args:
            x: The new observation(s), as a PyTorch tensor, or any Iterable
                that can be converted to a PyTorch tensor, or another
                RunningNorm instance.
                If given as a tensor or as an Iterable, the shape of `x` can
                be the same with observation shape, or it can be augmented
                with an extra leftmost dimension.
                In the case of augmented dimension, `x` is interpreted not as
                a single observation, but as a batch of observations.
                If `x` is another RunningNorm instance, the stats stored by
                this RunningNorm instance will be updated with all the data
                stored by `x`.
            mask: Can be given as a 1-dimensional Iterable of booleans ONLY
                if `x` represents a batch of observations.
                If a `mask` is provided, the i-th observation within the
                observation batch `x` will be taken into account only if
                the i-th item of the `mask` is True.
            verify: Whether or not to verify the shape of the given Iterable
                objects. The default is True.
        """
        if isinstance(x, RunningNorm):
            # If we are to update our stats according to another RunningNorm instance

            if x._count > 0:
                # We bother only if x is non-empty

                if mask is not None:
                    # We were given another RunningNorm, not a batch of observations.
                    # So, we do not expect to receive a mask tensor.
                    # If a mask was provided, then this is an unexpected way of calling this function.
                    # We therefore raise an error.
                    raise ValueError(
                        "The `mask` argument is expected as None if the first argument is a RunningNorm."
                        " However, `mask` is found as something other than None."
                    )

                if self._shape != x._shape:
                    # If the shapes of this RunningNorm and of the other RunningNorm
                    # do not match, then we cannot use `x` for updating our stats.
                    # It might be the case that `x` was initialized for another
                    # task, with differently sized observations.
                    # We therefore raise an error.
                    raise ValueError(
                        f"The RunningNorm to be updated has the shape {self._shape}"
                        f" The other RunningNorm has the shape {self._shape}"
                        f" These shapes are incompatible."
                    )

                if self._has_no_data():
                    # If this RunningNorm has no data at all, then we clone the
                    # data of x.
                    self._sum = self._like_its_own(x._sum.clone())
                    self._sum_of_squares = self._like_its_own(x._sum_of_squares.clone())
                    self._count = x._count
                elif self._has_data():
                    # If this RunningNorm has its own data, then we update the
                    # stored data with the data stored by x.
                    self._sum += self._like_its_own(x._sum)
                    self._sum_of_squares += self._like_its_own(x._sum_of_squares)
                    self._count += x._count
                else:
                    assert False, "RunningNorm is in an invalid state! This might be a bug."
        else:
            # This is the case where the received argument x is not a
            # RunningNorm object, but an Iterable.

            if verify:
                # If we have the `verify` flag, then we make sure that
                # x is a tensor of the correct shape
                x = self._verify(x)

            if x.ndim == self._ndim:
                # If the shape of x is exactly the same with the observation shape
                # then we assume that x represents a single observation, and not a
                # batch of observations.

                if mask is not None:
                    # Since we are dealing with a single observation,
                    # we do not expect to receive a mask argument.
                    # If the mask argument was provided, then this is an unexpected
                    # usage of this function.
                    # We therefore raise an error.
                    raise ValueError(
                        "The `mask` argument is expected as None if the first argument is a single observation"
                        " (i.e. not a batch of observations, with an extra leftmost dimension)."
                        " However, `mask` is found as something other than None."
                    )

                # Since x is a single observation,
                # the sum of observations extracted from x is x itself,
                # and the sum of squared observations extracted from x is
                # the square of x itself.
                sum_of_x = x
                sum_of_x_squared = x.square()
                # We extracted a single observation from x
                n = 1
            elif x.ndim == (self._ndim + 1):
                # If the number of dimensions of x is one more than the number
                # of dimensions of this RunningNorm, then we assume that x is a batch
                # of observations.

                if mask is not None:
                    # If a mask is provided, then we first make sure that it is a tensor
                    # of dtype bool in the correct device.
                    mask = torch.as_tensor(mask, dtype=torch.bool, device=self._device)

                    if mask.ndim != 1:
                        # We expect the mask to be 1-dimensional.
                        # If not, we raise an error.
                        raise ValueError(
                            f"The `mask` tensor was expected as a 1-dimensional tensor."
                            f" However, its shape is {mask.shape}."
                        )

                    if len(mask) != x.shape[0]:
                        # If the length of the mask is not the batch size of x,
                        # then there is a mismatch.
                        # We therefore raise an error.
                        raise ValueError(
                            f"The shape of the given tensor is {x.shape}."
                            f" Therefore, the batch size of observations is {x.shape[0]}."
                            f" However, the given `mask` tensor does not has an incompatible length: {len(mask)}."
                        )

                    # We compute how many True items we have in the mask.
                    # This integer gives us how many observations we extract from x.
                    n = int(torch.sum(torch.as_tensor(mask, dtype=torch.int64, device=self._device)))

                    # We now re-cast the mask as the observation dtype (so that True items turn to 1.0
                    # and False items turn to 0.0), and then increase its number of dimensions so that
                    # it can operate directly with x.
                    mask = self._like_its_own(mask).reshape(torch.Size([x.shape[0]] + ([1] * (x.ndim - 1))))

                    # Finally, we multiply x with the mask. This means that the observations with corresponding
                    # mask values as False are zeroed out.
                    x = x * mask
                else:
                    # This is the case where we did not receive a mask.
                    # We can simply say that the number of observations to extract from x
                    # is the size of its leftmost dimension, i.e. the batch size.
                    n = x.shape[0]

                # With or without a mask, we are now ready to extract the sum and sum of squares
                # from x.
                sum_of_x = torch.sum(x, dim=0)
                sum_of_x_squared = torch.sum(x.square(), dim=0)
            else:
                # This is the case where the number of dimensions of x is unrecognized.
                # This case is actually already checked by the _verify(...) method earlier.
                # This defensive fallback case is only for when verify=False and it turned out
                # that the ndim is invalid.
                raise ValueError(f"Invalid shape: {x.shape}")

            # At this point, we handled all the valid cases regarding the Iterable x,
            # and we have our sum_of_x (sum of all observations), sum_of_squares
            # (sum of all squared observations), and n (number of observations extracted
            # from x).

            if self._has_no_data():
                # If our RunningNorm is empty, the observation data we extracted from x
                # become our RunningNorm's new data.
                self._sum = sum_of_x
                self._sum_of_squares = sum_of_x_squared
                self._count = n
            elif self._has_data():
                # If our RunningNorm is not empty, the stored data is updated with the
                # data extracted from x.
                self._sum += sum_of_x
                self._sum_of_squares += sum_of_x_squared
                self._count += n
            else:
                # This is an erroneous state where the internal data looks neither
                # existent nor completely empty.
                # This might be the result of a bug, or maybe this instance's
                # protected variables were tempered with from the outside.
                assert False, "RunningNorm is in an invalid state! This might be a bug."

    @property
    @torch.no_grad()
    def stats(self) -> CollectedStats:
        """
        The collected data's mean and standard deviation (stdev) in a tuple
        """

        # Using the internally stored sum, sum_of_squares, and count,
        # compute E[x] and E[x^2]
        E_x = self._sum / self._count
        E_x2 = self._sum_of_squares / self._count

        # The mean is E[x]
        mean = E_x

        # The variance is E[x^2] - (E[x])^2, elementwise clipped such that
        # it cannot go below min_variance
        variance = _clamp(E_x2 - E_x.square(), self._min_variance, None)

        # Standard deviation is finally computed as the square root of the variance
        stdev = torch.sqrt(variance)

        # Return the stats in a named tuple
        return CollectedStats(mean=mean, stdev=stdev)

    @property
    def mean(self) -> torch.Tensor:
        """
        The collected data's mean
        """
        return self._sum / self._count

    @property
    def stdev(self) -> torch.Tensor:
        """
        The collected data's standard deviation
        """
        return self.stats.stdev

    @property
    def sum(self) -> torch.Tensor:
        """
        The collected data's sum
        """
        return self._sum

    @property
    def sum_of_squares(self) -> torch.Tensor:
        """
        Sum of squares of the collected data
        """
        return self._sum_of_squares

    @property
    def count(self) -> int:
        """
        Number of observations encountered
        """
        return self._count

    @torch.no_grad()
    def normalize(self, x: Iterable, *, result_as_numpy: Optional[bool] = None, verify: bool = True) -> Iterable:
        """
        Normalize the given observation x.

        Args:
            x: The observation(s), as a PyTorch tensor, or any Iterable
                that is convertable to a PyTorch tensor.
                `x` can be a single observation, or it can be a batch
                of observations (with an extra leftmost dimension).
            result_as_numpy: Whether or not to return the normalized
                observation as a numpy array.
                If left as None (which is the default), then the returned
                type depends on x: a PyTorch tensor is returned if x is a
                PyTorch tensor, and a numpy array is returned otherwise.
                If True, the result is always a numpy array.
                If False, the result is always a PyTorch tensor.
            verify: Whether or not to check the type and dimensions of x.
                This is True by default.
                Note that, if `verify` is False, this function will not
                properly check the type of `x` and will assume that `x`
                is a PyTorch tensor.
        Returns:
            The normalized observation, as a PyTorch tensor or a numpy array.
        """

        if self._count == 0:
            # If this RunningNorm instance has no data yet,
            # then we do not know how to do the normalization.
            # We therefore raise an error.
            raise ValueError("Cannot do normalization because no data is collected yet.")

        if verify:
            # Here we verify the type and shape of x.

            if result_as_numpy is None:
                # If there is not an explicit request about the return type,
                # we infer the return type from the type of x:
                # if x is a tensor, we return a tensor;
                # otherwise, we assume x to be a CPU-bound iterable, and
                # therefore we return a numpy array.
                result_as_numpy = not isinstance(x, torch.Tensor)
            else:
                result_as_numpy = bool(result_as_numpy)

            # We call _verify() to make sure that x is of correct shape
            # and is properly converted to a PyTorch tensor.
            x = self._verify(x)

        # We get the mean and stdev of the collected data
        mean, stdev = self.stats

        # Now we compute the normalized observation, clipped according to the
        # lower and upper bounds expressed by the `clip` tuple, if exists.
        result = _clamp((x - mean) / stdev, self._lb, self._ub)

        if result_as_numpy:
            # If we are to return the result as a numpy array, we do the
            # necessary conversion.
            result = result.cpu().numpy()

        # Finally, return the result
        return result

    @torch.no_grad()
    def update_and_normalize(self, x: Iterable, mask: Optional[Iterable] = None) -> Iterable:
        """
        Update the observation stats according to x, then normalize x.

        Args:
            x: The observation(s), as a PyTorch tensor, or as an Iterable
                which can be converted to a PyTorch tensor.
                The shape of x can be the same with the observaiton shape,
                or it can be augmented with an extra leftmost dimension
                to express a batch of observations.
            mask: Can be given as a 1-dimensional Iterable of booleans ONLY
                if `x` represents a batch of observations.
                If a `mask` is provided, the i-th observation within the
                observation batch `x` will be taken into account only if
                the the i-th item of the `mask` is True.
        Returns:
            The normalized counterpart of the observation(s) expressed by x.
        """
        result_as_numpy = not isinstance(x, torch.Tensor)
        x = self._verify(x)

        self.update(x, mask, verify=False)
        result = self.normalize(x, verify=False)

        if result_as_numpy:
            result = result.cpu().numpy()

        return result

    def to_layer(self) -> "ObsNormLayer":
        """
        Make a PyTorch module which normalizes the its inputs.

        Returns:
            An ObsNormLayer instance.
        """
        mean, stdev = self.stats
        low = self.low
        high = self.high
        return ObsNormLayer(mean=mean, stdev=stdev, low=low, high=high)

    def __repr__(self) -> str:
        return f"<{self.__class__.__name__}, count: {self.count}>"

    def __copy__(self) -> "RunningNorm":
        return deepcopy(self)
count: int property readonly

Number of observations encountered

device: Union[str, torch.device] property readonly

The device in which the observation stats are held

dtype: Union[str, torch.dtype, numpy.dtype, Type] property readonly

The dtype of the stored observation stats

high: Optional[float] property readonly

The higher (upper) component of the bounds given in the clip tuple. If clip was initialized as None, this is also None.

low: Optional[float] property readonly

The lower component of the bounds given in the clip tuple. If clip was initialized as None, this is also None.

mean: Tensor property readonly

The collected data's mean

min_variance: float property readonly

Minimum variance

shape: tuple property readonly

Observation shape

stats: CollectedStats property readonly

The collected data's mean and standard deviation (stdev) in a tuple

stdev: Tensor property readonly

The collected data's standard deviation

sum: Tensor property readonly

The collected data's sum

sum_of_squares: Tensor property readonly

Sum of squares of the collected data

__init__(self, *, shape, dtype, device=None, min_variance=0.01, clip=None) special

__init__(...): Initialize the RunningNorm

Parameters:

Name Type Description Default
shape Union[tuple, int]

Observation shape. Can be an integer or a tuple.

required
dtype Union[str, torch.dtype, numpy.dtype, Type]

The dtype of the observations.

required
device Union[str, torch.device]

The device in which the observation stats are held. If left as None, the device is assumed to be "cpu".

None
min_variance float

A lower bound for the variance to be used in the normalization computations. In other words, if the computed variance according to the collected observations ends up lower than min_variance, this min_variance will be used instead (in an elementwise manner) while computing the normalized observations. As in Salimans et al. (2017), the default is 1e-2.

0.01
clip Optional[tuple]

Can be left as None (which is the default), or can be given as a pair of real numbers. This is used for clipping the observations after the normalization operation. In Salimans et al. (2017), (-5.0, +5.0) was used.

None
Source code in evotorch/neuroevolution/net/runningnorm.py
def __init__(
    self,
    *,
    shape: Union[tuple, int],
    dtype: DType,
    device: Optional[Device] = None,
    min_variance: float = 1e-2,
    clip: Optional[tuple] = None,
) -> None:
    """
    `__init__(...)`: Initialize the RunningNorm

    Args:
        shape: Observation shape. Can be an integer or a tuple.
        dtype: The dtype of the observations.
        device: The device in which the observation stats are held.
            If left as None, the device is assumed to be "cpu".
        min_variance: A lower bound for the variance to be used in
            the normalization computations.
            In other words, if the computed variance according to the
            collected observations ends up lower than `min_variance`,
            this `min_variance` will be used instead (in an elementwise
            manner) while computing the normalized observations.
            As in Salimans et al. (2017), the default is 1e-2.
        clip: Can be left as None (which is the default), or can be
            given as a pair of real numbers.
            This is used for clipping the observations after the
            normalization operation.
            In Salimans et al. (2017), (-5.0, +5.0) was used.
    """

    # Make sure that the shape is stored as a torch.Size object.
    if isinstance(shape, Iterable):
        self._shape = torch.Size(shape)
    else:
        self._shape = torch.Size([int(shape)])

    # Store the number of dimensions
    self._ndim = len(self._shape)

    # Store the dtype and the device
    self._dtype = to_torch_dtype(dtype)
    self._device = "cpu" if device is None else device

    # Initialize the internally stored data as empty
    self._sum: Optional[torch.Tensor] = None
    self._sum_of_squares: Optional[torch.Tensor] = None
    self._count: int = 0

    # Store the minimum variance
    self._min_variance = float(min_variance)

    if clip is not None:
        # If a clip tuple was provided, store the specified lower and upper bounds
        lb, ub = clip
        self._lb = float(lb)
        self._ub = float(ub)
    else:
        # If a clip tuple was not provided the bounds are stored as None
        self._lb = None
        self._ub = None
normalize(self, x, *, result_as_numpy=None, verify=True)

Normalize the given observation x.

Parameters:

Name Type Description Default
x Iterable

The observation(s), as a PyTorch tensor, or any Iterable that is convertable to a PyTorch tensor. x can be a single observation, or it can be a batch of observations (with an extra leftmost dimension).

required
result_as_numpy Optional[bool]

Whether or not to return the normalized observation as a numpy array. If left as None (which is the default), then the returned type depends on x: a PyTorch tensor is returned if x is a PyTorch tensor, and a numpy array is returned otherwise. If True, the result is always a numpy array. If False, the result is always a PyTorch tensor.

None
verify bool

Whether or not to check the type and dimensions of x. This is True by default. Note that, if verify is False, this function will not properly check the type of x and will assume that x is a PyTorch tensor.

True

Returns:

Type Description
Iterable

The normalized observation, as a PyTorch tensor or a numpy array.

Source code in evotorch/neuroevolution/net/runningnorm.py
@torch.no_grad()
def normalize(self, x: Iterable, *, result_as_numpy: Optional[bool] = None, verify: bool = True) -> Iterable:
    """
    Normalize the given observation x.

    Args:
        x: The observation(s), as a PyTorch tensor, or any Iterable
            that is convertable to a PyTorch tensor.
            `x` can be a single observation, or it can be a batch
            of observations (with an extra leftmost dimension).
        result_as_numpy: Whether or not to return the normalized
            observation as a numpy array.
            If left as None (which is the default), then the returned
            type depends on x: a PyTorch tensor is returned if x is a
            PyTorch tensor, and a numpy array is returned otherwise.
            If True, the result is always a numpy array.
            If False, the result is always a PyTorch tensor.
        verify: Whether or not to check the type and dimensions of x.
            This is True by default.
            Note that, if `verify` is False, this function will not
            properly check the type of `x` and will assume that `x`
            is a PyTorch tensor.
    Returns:
        The normalized observation, as a PyTorch tensor or a numpy array.
    """

    if self._count == 0:
        # If this RunningNorm instance has no data yet,
        # then we do not know how to do the normalization.
        # We therefore raise an error.
        raise ValueError("Cannot do normalization because no data is collected yet.")

    if verify:
        # Here we verify the type and shape of x.

        if result_as_numpy is None:
            # If there is not an explicit request about the return type,
            # we infer the return type from the type of x:
            # if x is a tensor, we return a tensor;
            # otherwise, we assume x to be a CPU-bound iterable, and
            # therefore we return a numpy array.
            result_as_numpy = not isinstance(x, torch.Tensor)
        else:
            result_as_numpy = bool(result_as_numpy)

        # We call _verify() to make sure that x is of correct shape
        # and is properly converted to a PyTorch tensor.
        x = self._verify(x)

    # We get the mean and stdev of the collected data
    mean, stdev = self.stats

    # Now we compute the normalized observation, clipped according to the
    # lower and upper bounds expressed by the `clip` tuple, if exists.
    result = _clamp((x - mean) / stdev, self._lb, self._ub)

    if result_as_numpy:
        # If we are to return the result as a numpy array, we do the
        # necessary conversion.
        result = result.cpu().numpy()

    # Finally, return the result
    return result
reset(self)

Remove all the collected observation data.

Source code in evotorch/neuroevolution/net/runningnorm.py
def reset(self):
    """
    Remove all the collected observation data.
    """
    self._sum = None
    self._sum_of_squares = None
    self._count = 0
to(self, device)

If the target device is a different device, then make a copy of this RunningNorm instance on the target device. If the target device is the same with this RunningNorm's device, then return this RunningNorm itself.

Parameters:

Name Type Description Default
device Union[str, torch.device]

The target device.

required

Returns:

Type Description
RunningNorm

The RunningNorm on the target device. This can be a copy, or the original RunningNorm instance itself.

Source code in evotorch/neuroevolution/net/runningnorm.py
def to(self, device: Device) -> "RunningNorm":
    """
    If the target device is a different device, then make a copy of this
    RunningNorm instance on the target device.
    If the target device is the same with this RunningNorm's device, then
    return this RunningNorm itself.

    Args:
        device: The target device.
    Returns:
        The RunningNorm on the target device. This can be a copy, or the
        original RunningNorm instance itself.
    """
    if torch.device(device) == torch.device(self.device):
        return self
    else:
        new_running_norm = object.__new__(type(self))

        already_handled = {"_sum", "_sum_of_squares", "_device"}
        new_running_norm._sum = self._sum.to(device)
        new_running_norm._sum_of_squares = self._sum_of_squares.to(device)
        new_running_norm._device = device

        for k, v in self.__dict__.items():
            if k not in already_handled:
                setattr(new_running_norm, k, deepcopy(v))

        return new_running_norm
to_layer(self)

Make a PyTorch module which normalizes the its inputs.

Returns:

Type Description
ObsNormLayer

An ObsNormLayer instance.

Source code in evotorch/neuroevolution/net/runningnorm.py
def to_layer(self) -> "ObsNormLayer":
    """
    Make a PyTorch module which normalizes the its inputs.

    Returns:
        An ObsNormLayer instance.
    """
    mean, stdev = self.stats
    low = self.low
    high = self.high
    return ObsNormLayer(mean=mean, stdev=stdev, low=low, high=high)
update(self, x, mask=None, *, verify=True)

Update the stored stats with new observation data.

Parameters:

Name Type Description Default
x Union[Iterable, RunningNorm]

The new observation(s), as a PyTorch tensor, or any Iterable that can be converted to a PyTorch tensor, or another RunningNorm instance. If given as a tensor or as an Iterable, the shape of x can be the same with observation shape, or it can be augmented with an extra leftmost dimension. In the case of augmented dimension, x is interpreted not as a single observation, but as a batch of observations. If x is another RunningNorm instance, the stats stored by this RunningNorm instance will be updated with all the data stored by x.

required
mask Optional[Iterable]

Can be given as a 1-dimensional Iterable of booleans ONLY if x represents a batch of observations. If a mask is provided, the i-th observation within the observation batch x will be taken into account only if the i-th item of the mask is True.

None
verify bool

Whether or not to verify the shape of the given Iterable objects. The default is True.

True
Source code in evotorch/neuroevolution/net/runningnorm.py
@torch.no_grad()
def update(self, x: Union[Iterable, "RunningNorm"], mask: Optional[Iterable] = None, *, verify: bool = True):
    """
    Update the stored stats with new observation data.

    Args:
        x: The new observation(s), as a PyTorch tensor, or any Iterable
            that can be converted to a PyTorch tensor, or another
            RunningNorm instance.
            If given as a tensor or as an Iterable, the shape of `x` can
            be the same with observation shape, or it can be augmented
            with an extra leftmost dimension.
            In the case of augmented dimension, `x` is interpreted not as
            a single observation, but as a batch of observations.
            If `x` is another RunningNorm instance, the stats stored by
            this RunningNorm instance will be updated with all the data
            stored by `x`.
        mask: Can be given as a 1-dimensional Iterable of booleans ONLY
            if `x` represents a batch of observations.
            If a `mask` is provided, the i-th observation within the
            observation batch `x` will be taken into account only if
            the i-th item of the `mask` is True.
        verify: Whether or not to verify the shape of the given Iterable
            objects. The default is True.
    """
    if isinstance(x, RunningNorm):
        # If we are to update our stats according to another RunningNorm instance

        if x._count > 0:
            # We bother only if x is non-empty

            if mask is not None:
                # We were given another RunningNorm, not a batch of observations.
                # So, we do not expect to receive a mask tensor.
                # If a mask was provided, then this is an unexpected way of calling this function.
                # We therefore raise an error.
                raise ValueError(
                    "The `mask` argument is expected as None if the first argument is a RunningNorm."
                    " However, `mask` is found as something other than None."
                )

            if self._shape != x._shape:
                # If the shapes of this RunningNorm and of the other RunningNorm
                # do not match, then we cannot use `x` for updating our stats.
                # It might be the case that `x` was initialized for another
                # task, with differently sized observations.
                # We therefore raise an error.
                raise ValueError(
                    f"The RunningNorm to be updated has the shape {self._shape}"
                    f" The other RunningNorm has the shape {self._shape}"
                    f" These shapes are incompatible."
                )

            if self._has_no_data():
                # If this RunningNorm has no data at all, then we clone the
                # data of x.
                self._sum = self._like_its_own(x._sum.clone())
                self._sum_of_squares = self._like_its_own(x._sum_of_squares.clone())
                self._count = x._count
            elif self._has_data():
                # If this RunningNorm has its own data, then we update the
                # stored data with the data stored by x.
                self._sum += self._like_its_own(x._sum)
                self._sum_of_squares += self._like_its_own(x._sum_of_squares)
                self._count += x._count
            else:
                assert False, "RunningNorm is in an invalid state! This might be a bug."
    else:
        # This is the case where the received argument x is not a
        # RunningNorm object, but an Iterable.

        if verify:
            # If we have the `verify` flag, then we make sure that
            # x is a tensor of the correct shape
            x = self._verify(x)

        if x.ndim == self._ndim:
            # If the shape of x is exactly the same with the observation shape
            # then we assume that x represents a single observation, and not a
            # batch of observations.

            if mask is not None:
                # Since we are dealing with a single observation,
                # we do not expect to receive a mask argument.
                # If the mask argument was provided, then this is an unexpected
                # usage of this function.
                # We therefore raise an error.
                raise ValueError(
                    "The `mask` argument is expected as None if the first argument is a single observation"
                    " (i.e. not a batch of observations, with an extra leftmost dimension)."
                    " However, `mask` is found as something other than None."
                )

            # Since x is a single observation,
            # the sum of observations extracted from x is x itself,
            # and the sum of squared observations extracted from x is
            # the square of x itself.
            sum_of_x = x
            sum_of_x_squared = x.square()
            # We extracted a single observation from x
            n = 1
        elif x.ndim == (self._ndim + 1):
            # If the number of dimensions of x is one more than the number
            # of dimensions of this RunningNorm, then we assume that x is a batch
            # of observations.

            if mask is not None:
                # If a mask is provided, then we first make sure that it is a tensor
                # of dtype bool in the correct device.
                mask = torch.as_tensor(mask, dtype=torch.bool, device=self._device)

                if mask.ndim != 1:
                    # We expect the mask to be 1-dimensional.
                    # If not, we raise an error.
                    raise ValueError(
                        f"The `mask` tensor was expected as a 1-dimensional tensor."
                        f" However, its shape is {mask.shape}."
                    )

                if len(mask) != x.shape[0]:
                    # If the length of the mask is not the batch size of x,
                    # then there is a mismatch.
                    # We therefore raise an error.
                    raise ValueError(
                        f"The shape of the given tensor is {x.shape}."
                        f" Therefore, the batch size of observations is {x.shape[0]}."
                        f" However, the given `mask` tensor does not has an incompatible length: {len(mask)}."
                    )

                # We compute how many True items we have in the mask.
                # This integer gives us how many observations we extract from x.
                n = int(torch.sum(torch.as_tensor(mask, dtype=torch.int64, device=self._device)))

                # We now re-cast the mask as the observation dtype (so that True items turn to 1.0
                # and False items turn to 0.0), and then increase its number of dimensions so that
                # it can operate directly with x.
                mask = self._like_its_own(mask).reshape(torch.Size([x.shape[0]] + ([1] * (x.ndim - 1))))

                # Finally, we multiply x with the mask. This means that the observations with corresponding
                # mask values as False are zeroed out.
                x = x * mask
            else:
                # This is the case where we did not receive a mask.
                # We can simply say that the number of observations to extract from x
                # is the size of its leftmost dimension, i.e. the batch size.
                n = x.shape[0]

            # With or without a mask, we are now ready to extract the sum and sum of squares
            # from x.
            sum_of_x = torch.sum(x, dim=0)
            sum_of_x_squared = torch.sum(x.square(), dim=0)
        else:
            # This is the case where the number of dimensions of x is unrecognized.
            # This case is actually already checked by the _verify(...) method earlier.
            # This defensive fallback case is only for when verify=False and it turned out
            # that the ndim is invalid.
            raise ValueError(f"Invalid shape: {x.shape}")

        # At this point, we handled all the valid cases regarding the Iterable x,
        # and we have our sum_of_x (sum of all observations), sum_of_squares
        # (sum of all squared observations), and n (number of observations extracted
        # from x).

        if self._has_no_data():
            # If our RunningNorm is empty, the observation data we extracted from x
            # become our RunningNorm's new data.
            self._sum = sum_of_x
            self._sum_of_squares = sum_of_x_squared
            self._count = n
        elif self._has_data():
            # If our RunningNorm is not empty, the stored data is updated with the
            # data extracted from x.
            self._sum += sum_of_x
            self._sum_of_squares += sum_of_x_squared
            self._count += n
        else:
            # This is an erroneous state where the internal data looks neither
            # existent nor completely empty.
            # This might be the result of a bug, or maybe this instance's
            # protected variables were tempered with from the outside.
            assert False, "RunningNorm is in an invalid state! This might be a bug."
update_and_normalize(self, x, mask=None)

Update the observation stats according to x, then normalize x.

Parameters:

Name Type Description Default
x Iterable

The observation(s), as a PyTorch tensor, or as an Iterable which can be converted to a PyTorch tensor. The shape of x can be the same with the observaiton shape, or it can be augmented with an extra leftmost dimension to express a batch of observations.

required
mask Optional[Iterable]

Can be given as a 1-dimensional Iterable of booleans ONLY if x represents a batch of observations. If a mask is provided, the i-th observation within the observation batch x will be taken into account only if the the i-th item of the mask is True.

None

Returns:

Type Description
Iterable

The normalized counterpart of the observation(s) expressed by x.

Source code in evotorch/neuroevolution/net/runningnorm.py
@torch.no_grad()
def update_and_normalize(self, x: Iterable, mask: Optional[Iterable] = None) -> Iterable:
    """
    Update the observation stats according to x, then normalize x.

    Args:
        x: The observation(s), as a PyTorch tensor, or as an Iterable
            which can be converted to a PyTorch tensor.
            The shape of x can be the same with the observaiton shape,
            or it can be augmented with an extra leftmost dimension
            to express a batch of observations.
        mask: Can be given as a 1-dimensional Iterable of booleans ONLY
            if `x` represents a batch of observations.
            If a `mask` is provided, the i-th observation within the
            observation batch `x` will be taken into account only if
            the the i-th item of the `mask` is True.
    Returns:
        The normalized counterpart of the observation(s) expressed by x.
    """
    result_as_numpy = not isinstance(x, torch.Tensor)
    x = self._verify(x)

    self.update(x, mask, verify=False)
    result = self.normalize(x, verify=False)

    if result_as_numpy:
        result = result.cpu().numpy()

    return result

runningstat

RunningStat

Tool for efficiently computing the mean and stdev of arrays. The arrays themselves are not stored separately, instead, they are accumulated.

This RunningStat is implemented as a wrapper around RunningNorm. The difference is that the interface of RunningStat is simplified to expect only numpy arrays, and expect only non-vectorized observations. With this simplified interface, RunningStat is meant to be used by GymNE, on classical non-vectorized gym tasks.

Source code in evotorch/neuroevolution/net/runningstat.py
class RunningStat:
    """
    Tool for efficiently computing the mean and stdev of arrays.
    The arrays themselves are not stored separately,
    instead, they are accumulated.

    This RunningStat is implemented as a wrapper around RunningNorm.
    The difference is that the interface of RunningStat is simplified
    to expect only numpy arrays, and expect only non-vectorized
    observations.
    With this simplified interface, RunningStat is meant to be used
    by GymNE, on classical non-vectorized gym tasks.
    """

    def __init__(self):
        """
        `__init__(...)`: Initialize the RunningStat.
        """
        self._rn: Optional[RunningNorm] = None
        self.reset()

    def reset(self):
        """
        Reset the RunningStat to its initial state.
        """
        self._rn = None

    @property
    def count(self) -> int:
        """
        Get the number of arrays accumulated.
        """
        if self._rn is None:
            return 0
        else:
            return self._rn.count

    @property
    def sum(self) -> np.ndarray:
        """
        Get the sum of all accumulated arrays.
        """
        return self._rn.sum.numpy()

    @property
    def sum_of_squares(self) -> np.ndarray:
        """
        Get the sum of squares of all accumulated arrays.
        """
        return self._rn.sum_of_squares.numpy()

    @property
    def mean(self) -> np.ndarray:
        """
        Get the mean of all accumulated arrays.
        """
        return self._rn.mean.numpy()

    @property
    def stdev(self) -> np.ndarray:
        """
        Get the standard deviation of all accumulated arrays.
        """
        return self._rn.stdev.numpy()

    def update(self, x: Union[np.ndarray, "RunningStat"]):
        """
        Accumulate more data into the RunningStat object.
        If the argument is an array, that array is added
        as one more data element.
        If the argument is another RunningStat instance,
        all the stats accumulated by that RunningStat object
        are added into this RunningStat object.
        """
        if isinstance(x, RunningStat):
            if x.count > 0:
                if self._rn is None:
                    self._rn = deepcopy(x._rn)
                else:
                    self._rn.update(x._rn)
        else:
            if self._rn is None:
                x = np.array(x, dtype="float32")
                self._rn = RunningNorm(shape=x.shape, dtype="float32", device="cpu")
            self._rn.update(x)

    def normalize(self, x: Union[np.ndarray, list]) -> np.ndarray:
        """
        Normalize the array x according to the accumulated stats.
        """
        if self._rn is None:
            return x
        else:
            x = np.array(x, dtype="float32")
            return self._rn.normalize(x)

    def __copy__(self):
        return deepcopy(self)

    def __repr__(self) -> str:
        return f"<{self.__class__.__name__}, count: {self.count}>"

    def to(self, device: Union[str, torch.device]) -> "RunningStat":
        """
        If the target device is cpu, return this RunningStat instance itself.
        A RunningStat object is meant to work with numpy arrays. Therefore,
        any device other than the cpu will trigger an error.

        Args:
            device: The target device. Only cpu is supported.
        Returns:
            The original RunningStat.
        """
        if torch.device(device) == torch.device("cpu"):
            return self
        else:
            raise ValueError(
                f"The received target device is {repr(device)}. However, RunningStat can only work on a cpu."
            )

    def to_layer(self) -> nn.Module:
        """
        Make a PyTorch module which normalizes the its inputs.

        Returns:
            An ObsNormLayer instance.
        """
        return self._rn.to_layer()
count: int property readonly

Get the number of arrays accumulated.

mean: ndarray property readonly

Get the mean of all accumulated arrays.

stdev: ndarray property readonly

Get the standard deviation of all accumulated arrays.

sum: ndarray property readonly

Get the sum of all accumulated arrays.

sum_of_squares: ndarray property readonly

Get the sum of squares of all accumulated arrays.

__init__(self) special

__init__(...): Initialize the RunningStat.

Source code in evotorch/neuroevolution/net/runningstat.py
def __init__(self):
    """
    `__init__(...)`: Initialize the RunningStat.
    """
    self._rn: Optional[RunningNorm] = None
    self.reset()
normalize(self, x)

Normalize the array x according to the accumulated stats.

Source code in evotorch/neuroevolution/net/runningstat.py
def normalize(self, x: Union[np.ndarray, list]) -> np.ndarray:
    """
    Normalize the array x according to the accumulated stats.
    """
    if self._rn is None:
        return x
    else:
        x = np.array(x, dtype="float32")
        return self._rn.normalize(x)
reset(self)

Reset the RunningStat to its initial state.

Source code in evotorch/neuroevolution/net/runningstat.py
def reset(self):
    """
    Reset the RunningStat to its initial state.
    """
    self._rn = None
to(self, device)

If the target device is cpu, return this RunningStat instance itself. A RunningStat object is meant to work with numpy arrays. Therefore, any device other than the cpu will trigger an error.

Parameters:

Name Type Description Default
device Union[str, torch.device]

The target device. Only cpu is supported.

required

Returns:

Type Description
RunningStat

The original RunningStat.

Source code in evotorch/neuroevolution/net/runningstat.py
def to(self, device: Union[str, torch.device]) -> "RunningStat":
    """
    If the target device is cpu, return this RunningStat instance itself.
    A RunningStat object is meant to work with numpy arrays. Therefore,
    any device other than the cpu will trigger an error.

    Args:
        device: The target device. Only cpu is supported.
    Returns:
        The original RunningStat.
    """
    if torch.device(device) == torch.device("cpu"):
        return self
    else:
        raise ValueError(
            f"The received target device is {repr(device)}. However, RunningStat can only work on a cpu."
        )
to_layer(self)

Make a PyTorch module which normalizes the its inputs.

Returns:

Type Description
Module

An ObsNormLayer instance.

Source code in evotorch/neuroevolution/net/runningstat.py
def to_layer(self) -> nn.Module:
    """
    Make a PyTorch module which normalizes the its inputs.

    Returns:
        An ObsNormLayer instance.
    """
    return self._rn.to_layer()
update(self, x)

Accumulate more data into the RunningStat object. If the argument is an array, that array is added as one more data element. If the argument is another RunningStat instance, all the stats accumulated by that RunningStat object are added into this RunningStat object.

Source code in evotorch/neuroevolution/net/runningstat.py
def update(self, x: Union[np.ndarray, "RunningStat"]):
    """
    Accumulate more data into the RunningStat object.
    If the argument is an array, that array is added
    as one more data element.
    If the argument is another RunningStat instance,
    all the stats accumulated by that RunningStat object
    are added into this RunningStat object.
    """
    if isinstance(x, RunningStat):
        if x.count > 0:
            if self._rn is None:
                self._rn = deepcopy(x._rn)
            else:
                self._rn.update(x._rn)
    else:
        if self._rn is None:
            x = np.array(x, dtype="float32")
            self._rn = RunningNorm(shape=x.shape, dtype="float32", device="cpu")
        self._rn.update(x)

statefulmodule

StatefulModule (Module)

A wrapper that provides a stateful interface for recurrent torch modules.

If the torch module to be wrapped is non-recurrent and its forward method has a single input (the input tensor) and a single output (the output tensor), then this wrapper module acts as a no-op wrapper.

If the torch module to be wrapped is recurrent and its forward method has two inputs (the input tensor and an optional second argument for the hidden state) and two outputs (the output tensor and the new hidden state), then this wrapper brings a new forward-passing interface. In this new interface, the forward method has a single input (the input tensor) and a single output (the output tensor). The hidden states, instead of being explicitly requested via a second argument and returned as a second result, are stored and used by the wrapper. When a new series of inputs is to be used, one has to call the reset() method of this wrapper.

Source code in evotorch/neuroevolution/net/statefulmodule.py
class StatefulModule(nn.Module):
    """
    A wrapper that provides a stateful interface for recurrent torch modules.

    If the torch module to be wrapped is non-recurrent and its forward method
    has a single input (the input tensor) and a single output (the output
    tensor), then this wrapper module acts as a no-op wrapper.

    If the torch module to be wrapped is recurrent and its forward method has
    two inputs (the input tensor and an optional second argument for the hidden
    state) and two outputs (the output tensor and the new hidden state), then
    this wrapper brings a new forward-passing interface. In this new interface,
    the forward method has a single input (the input tensor) and a single
    output (the output tensor). The hidden states, instead of being
    explicitly requested via a second argument and returned as a second
    result, are stored and used by the wrapper.
    When a new series of inputs is to be used, one has to call the `reset()`
    method of this wrapper.
    """

    def __init__(self, wrapped_module: nn.Module):
        """
        `__init__(...)`: Initialize the StatefulModule.

        Args:
            wrapped_module: The `torch.nn.Module` instance to wrap.
        """
        super().__init__()

        # Declare the variable that will store the hidden state of wrapped_module, if any.
        self._hidden: Any = None

        # Store the module that is wrapped.
        self.wrapped_module = wrapped_module

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        if self._hidden is None:
            # If there is no stored hidden state, then only pass the input tensor to the wrapped module.
            out = self.wrapped_module(x)
        else:
            # If there is a hidden state saved from the previous call to this `forward(...)` method, then pass the
            # input tensor and this stored hidden state.
            out = self.wrapped_module(x, self._hidden)

        if isinstance(out, tuple):
            # If the result of the wrapped module is a tuple, then we assume that the wrapped module returned an
            # output tensor and a hidden state. We assume the first element of this tuple as the output tensor,
            # and the second element as the new hidden state.
            # We set the variable y to the output tensor, and we store the new hidden state via the attribute
            # `_hidden`.
            y, self._hidden = out
        else:
            # If the result of the wrapped module is not a tuple, then we assume that the wrapped module returned
            # only the output tensor. We set the variable y to the output tensor, and set the attribute `_hidden`
            # as None to indicate that there was no hidden state received.
            y = out
            self._hidden = None

        # We return y, which stores the output received by the wrapped module.
        return y

    def reset(self):
        """
        Reset the hidden state, if any.
        """
        self._hidden = None
__init__(self, wrapped_module) special

__init__(...): Initialize the StatefulModule.

Parameters:

Name Type Description Default
wrapped_module Module

The torch.nn.Module instance to wrap.

required
Source code in evotorch/neuroevolution/net/statefulmodule.py
def __init__(self, wrapped_module: nn.Module):
    """
    `__init__(...)`: Initialize the StatefulModule.

    Args:
        wrapped_module: The `torch.nn.Module` instance to wrap.
    """
    super().__init__()

    # Declare the variable that will store the hidden state of wrapped_module, if any.
    self._hidden: Any = None

    # Store the module that is wrapped.
    self.wrapped_module = wrapped_module
forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/statefulmodule.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    if self._hidden is None:
        # If there is no stored hidden state, then only pass the input tensor to the wrapped module.
        out = self.wrapped_module(x)
    else:
        # If there is a hidden state saved from the previous call to this `forward(...)` method, then pass the
        # input tensor and this stored hidden state.
        out = self.wrapped_module(x, self._hidden)

    if isinstance(out, tuple):
        # If the result of the wrapped module is a tuple, then we assume that the wrapped module returned an
        # output tensor and a hidden state. We assume the first element of this tuple as the output tensor,
        # and the second element as the new hidden state.
        # We set the variable y to the output tensor, and we store the new hidden state via the attribute
        # `_hidden`.
        y, self._hidden = out
    else:
        # If the result of the wrapped module is not a tuple, then we assume that the wrapped module returned
        # only the output tensor. We set the variable y to the output tensor, and set the attribute `_hidden`
        # as None to indicate that there was no hidden state received.
        y = out
        self._hidden = None

    # We return y, which stores the output received by the wrapped module.
    return y
reset(self)

Reset the hidden state, if any.

Source code in evotorch/neuroevolution/net/statefulmodule.py
def reset(self):
    """
    Reset the hidden state, if any.
    """
    self._hidden = None

ensure_stateful(net)

Ensure that a module is wrapped by StatefulModule.

If the given module is already wrapped by StatefulModule, then the module itself is returned. If the given module is not wrapped by StatefulModule, then this function first wraps the module via a new StatefulModule instance, and then this new wrapper is returned.

Parameters:

Name Type Description Default
net Module

The torch.nn.Module to be wrapped by StatefulModule (if it is not already wrapped by it).

required

Returns:

Type Description
StatefulModule

The module net, wrapped by StatefulModule.

Source code in evotorch/neuroevolution/net/statefulmodule.py
def ensure_stateful(net: nn.Module) -> StatefulModule:
    """
    Ensure that a module is wrapped by StatefulModule.

    If the given module is already wrapped by StatefulModule, then the
    module itself is returned.
    If the given module is not wrapped by StatefulModule, then this function
    first wraps the module via a new StatefulModule instance, and then this
    new wrapper is returned.

    Args:
        net: The `torch.nn.Module` to be wrapped by StatefulModule (if it is
            not already wrapped by it).
    Returns:
        The module `net`, wrapped by StatefulModule.
    """
    if not isinstance(net, StatefulModule):
        return StatefulModule(net)
    return net

vecrl

This namespace provides various vectorized reinforcement learning utilities.

Policy

A Policy for deciding the actions for a reinforcement learning environment.

This can be seen as a stateful wrapper around a PyTorch module.

Let us assume that we have the following PyTorch module:

from torch import nn

net = nn.Linear(5, 8)

which has 48 parameters (when all the parameters are flattened). Let us randomly generate a parameter vector for our module net:

parameters = torch.randn(48)

We can now prepare a policy:

policy = Policy(net)
policy.set_parameters(parameters)

If we generate a random observation:

observation = torch.randn(5)

We can receive our action as follows:

action = policy(observation)

If the PyTorch module that we wish to wrap is a recurrent network (i.e. a network which expects an optional second argument for the hidden state, and returns a second value which represents the updated hidden state), then, the hidden state is automatically managed by the Policy instance.

Let us assume that we have a recurrent network named recnet.

policy = Policy(recnet)
policy.set_parameters(parameters_of_recnet)

In this case, because the hidden state of the network is internally managed, the usage is still the same with our previous non-recurrent

Examples:

action = policy(observation)

When using a recurrent module on multiple episodes, it is important to reset the hidden state of the network. This is achieved by the reset method:

policy.reset()
action1 = policy(observation1)

# action2 will be computed with the hidden state generated by the
# previous forward-pass.
action2 = policy(observation2)

policy.reset()

# action3 will be computed according to the renewed hidden state.
action3 = policy(observation3)

Both for non-recurrent and recurrent networks, it is possible to perform vectorized operations. For now, let us return to our first non-recurrent example:

net = nn.Linear(5, 8)

Instead of generating only one parameter vector, we now generate a batch of parameter vectors. Let us say that our batch size is 10:

batch_of_parameters = torch.randn(10, 48)

Like we did in the non-batched examples, we can do:

policy = Policy(net)
policy.set_parameters(batch_of_parameters)

Because we are now in the batched mode, policy now expects a batch of observations and will return a batch of actions:

batch_of_observations = torch.randn(10, 5)
batch_of_actions = policy(batch_of_observations)

When doing vectorized reinforcement learning with a recurrent module, it can be the case that only some of the environments are finished, and therefore it is necessary to reset the hidden states associated with those environments only. The reset(...) method of Policy has a second argument to specify which of the recurrent network instances are to be reset. For example, if the episodes of the environments with indices 2 and 5 are about to restart (and therefore we wish to reset the states of the networks with indices 2 and 5), then, we can do:

policy.reset(torch.tensor([2, 5]))
Source code in evotorch/neuroevolution/net/vecrl.py
class Policy:
    """
    A Policy for deciding the actions for a reinforcement learning environment.

    This can be seen as a stateful wrapper around a PyTorch module.

    Let us assume that we have the following PyTorch module:

    ```python
    from torch import nn

    net = nn.Linear(5, 8)
    ```

    which has 48 parameters (when all the parameters are flattened).
    Let us randomly generate a parameter vector for our module `net`:

    ```python
    parameters = torch.randn(48)
    ```

    We can now prepare a policy:

    ```python
    policy = Policy(net)
    policy.set_parameters(parameters)
    ```

    If we generate a random observation:

    ```python
    observation = torch.randn(5)
    ```

    We can receive our action as follows:

    ```python
    action = policy(observation)
    ```

    If the PyTorch module that we wish to wrap is a recurrent network (i.e.
    a network which expects an optional second argument for the hidden state,
    and returns a second value which represents the updated hidden state),
    then, the hidden state is automatically managed by the Policy instance.

    Let us assume that we have a recurrent network named `recnet`.

    ```python
    policy = Policy(recnet)
    policy.set_parameters(parameters_of_recnet)
    ```

    In this case, because the hidden state of the network is internally
    managed, the usage is still the same with our previous non-recurrent
    example:

    ```python
    action = policy(observation)
    ```

    When using a recurrent module on multiple episodes, it is important
    to reset the hidden state of the network. This is achieved by the
    reset method:

    ```python
    policy.reset()
    action1 = policy(observation1)

    # action2 will be computed with the hidden state generated by the
    # previous forward-pass.
    action2 = policy(observation2)

    policy.reset()

    # action3 will be computed according to the renewed hidden state.
    action3 = policy(observation3)
    ```

    Both for non-recurrent and recurrent networks, it is possible to
    perform vectorized operations. For now, let us return to our
    first non-recurrent example:

    ```python
    net = nn.Linear(5, 8)
    ```

    Instead of generating only one parameter vector, we now generate
    a batch of parameter vectors. Let us say that our batch size is 10:

    ```python
    batch_of_parameters = torch.randn(10, 48)
    ```

    Like we did in the non-batched examples, we can do:

    ```python
    policy = Policy(net)
    policy.set_parameters(batch_of_parameters)
    ```

    Because we are now in the batched mode, `policy` now expects a batch
    of observations and will return a batch of actions:

    ```python
    batch_of_observations = torch.randn(10, 5)
    batch_of_actions = policy(batch_of_observations)
    ```

    When doing vectorized reinforcement learning with a recurrent module,
    it can be the case that only some of the environments are finished,
    and therefore it is necessary to reset the hidden states associated
    with those environments only. The `reset(...)` method of Policy
    has a second argument to specify which of the recurrent network
    instances are to be reset. For example, if the episodes of the
    environments with indices 2 and 5 are about to restart (and therefore
    we wish to reset the states of the networks with indices 2 and 5),
    then, we can do:

    ```python
    policy.reset(torch.tensor([2, 5]))
    ```
    """

    def __init__(self, net: Union[str, Callable, nn.Module], **kwargs):
        """
        `__init__(...)`: Initialize the Policy.

        Args:
            net: The network to be wrapped by the Policy object.
                This can be a string, a Callable (e.g. a `torch.nn.Module`
                subclass), or a `torch.nn.Module` instance.
                When this argument is a string, the network will be
                created with the help of the function
                `evotorch.neuroevolution.net.str_to_net(...)` and then
                wrapped. Please see the `str_to_net(...)` function's
                documentation for details regarding how a network structure
                can be expressed via strings.
            kwargs: Expected in the form of additional keyword arguments,
                these keyword arguments will be passed to the provided
                Callable object (if the argument `net` is a Callable)
                or to `str_to_net(...)` (if the argument `net` is a string)
                at the moment of generating the network.
                If the argument `net` is a `torch.nn.Module` instance,
                having any additional keyword arguments will trigger an
                error, because the network is already instantiated and
                therefore, it is not possible to pass these keyword arguments.
        """
        from ..net import str_to_net
        from ..net.functional import ModuleExpectingFlatParameters, make_functional_module

        if isinstance(net, str):
            self.__module = str_to_net(net, **kwargs)
        elif isinstance(net, nn.Module):
            if len(kwargs) > 0:
                raise ValueError(
                    f"When the network is given as an `nn.Module` instance, extra network arguments cannot be used"
                    f" (because the network is already instantiated)."
                    f" However, these extra keyword arguments were received: {kwargs}."
                )
            self.__module = net
        elif isinstance(net, Callable):
            self.__module = net(**kwargs)
        else:
            raise TypeError(
                f"The class `Policy` expected a string or an `nn.Module` instance, or a Callable, but received {net}"
                f" (whose type is {type(net)})."
            )

        self.__fmodule: ModuleExpectingFlatParameters = make_functional_module(self.__module)
        self.__state: Any = None
        self.__parameters: Optional[torch.Tensor] = None

    def set_parameters(self, parameters: torch.Tensor, indices: Optional[MaskOrIndices] = None, *, reset: bool = True):
        """
        Set the parameters of the policy.

        Args:
            parameters: A 1-dimensional or a 2-dimensional tensor containing
                the flattened parameters to be used with the neural network.
                If the given parameters are two-dimensional, then, given that
                the leftmost size of the parameter tensor is `n`, the
                observations will be expected in a batch with leftmost size
                `n`, and the returned actions will also be in a batch,
                again with the leftmost size `n`.
            indices: For when the parameters were previously given via a
                2-dimensional tensor, provide this argument if you would like
                to change only some rows of the previously given parameters.
                For example, if `indices` is given as `torch.tensor([2, 4])`
                and the argument `parameters` is given as a 2-dimensional
                tensor with leftmost size 2, then the rows with indices
                2 and 4 will be replaced by these new parameters provided
                via the argument `parameters`.
            reset: If given as True, the hidden states of the networks whose
                parameters just changed will be reset. If `indices` was not
                provided at all, then this means that the parameters of all
                networks are modified, in which case, all the hidden states
                will be reset.
                If given as False, no such resetting will be done.
        """
        if self.__parameters is None:
            if indices is not None:
                raise ValueError(
                    "The argument `indices` can be used only if network parameters were previously specified."
                    " However, it seems that the method `set_parameters(...)` was not called before."
                )
            self.__parameters = parameters
        else:
            if indices is None:
                self.__parameters = parameters
            else:
                self.__parameters[indices] = parameters

        if reset:
            self.reset(indices)

    def __call__(self, x: torch.Tensor) -> torch.Tensor:
        """
        Pass the given observations through the network.

        Args:
            x: The observations, as a PyTorch tensor.
                If the parameters were given (via the method
                `set_parameters(...)`) as a 1-dimensional tensor, then this
                argument is expected to store a single observation.
                If the parameters were given as a 2-dimensional tensor,
                then, this argument is expected to store a batch of
                observations, and the leftmost size of this observation
                tensor must match with the leftmost size of the parameter
                tensor.
        Returns:
            The output tensor, which represents the action to take.
        """
        if self.__parameters is None:
            raise ValueError("Please use the method `set_parameters(...)` before calling the policy.")

        if self.__state is None:
            further_args = (x,)
        else:
            further_args = (x, self.__state)

        parameters = self.__parameters
        ndim = parameters.ndim
        if ndim == 1:
            result = self.__fmodule(parameters, *further_args)
        elif ndim == 2:
            vmapped = vmap(self.__fmodule)
            result = vmapped(parameters, *further_args)
        else:
            raise ValueError(
                f"Expected the parameters as a 1 or 2 dimensional tensor."
                f" However, the received parameters tensor has {ndim} dimensions."
            )

        if isinstance(result, torch.Tensor):
            return result
        elif isinstance(result, tuple):
            result, state = result
            self.__state = state
            return result
        else:
            raise TypeError(f"The torch module used by the Policy returned an unexpected object: {result}")

    def reset(self, indices: Optional[MaskOrIndices] = None, *, copy: bool = True):
        """
        Reset the hidden states, if the contained module is a recurrent network.

        Args:
            indices: Optionally a sequence of integers or a sequence of
                booleans, specifying which networks' states will be
                reset. If left as None, then the states of all the networks
                will be reset.
            copy: When `indices` is given as something other than None,
                if `copy` is given as True, then the resetting will NOT
                be done in-place. Instead, a new copy of the hidden state
                will first be created, and then the specified regions
                of this new copy will be cleared, and then finally this
                modified copy will be declared as the new hidden state.
                It is a common practice for recurrent neural network
                implementations to return the same tensor both as its
                output and as (part of) its hidden state. With `copy=False`,
                the resetting would be done in-place, and the action
                tensor could be involuntarily reset as well.
                This in-place modification could cause silent bugs
                if the unintended modification on the action tensor
                happens BEFORE the action is sent to the reinforcement
                learning environment.
                To prevent such situations, the default value for the argument
                `copy` is True.
        """
        if indices is None:
            self.__state = None
        else:
            if self.__state is not None:
                with torch.no_grad():
                    if copy:
                        self.__state = deepcopy(self.__state)
                    reset_tensors(self.__state, indices)

    @property
    def parameters(self) -> torch.Tensor:
        """
        The currently used parameters.
        """
        return self.__parameters

    @property
    def h(self) -> Optional[torch.Tensor]:
        """
        The hidden state of the contained recurrent network, if any.

        If the contained recurrent network did not generate a hidden state
        yet, or if the contained network is not recurrent, then the result
        will be None.
        """
        return self.__state

    @property
    def parameter_length(self) -> int:
        """
        Length of the parameter tensor.
        """
        return self.__fmodule.parameter_length

    @property
    def wrapped_module(self) -> nn.Module:
        """
        The wrapped `torch.nn.Module` instance.
        """
        return self.__module

    def to_torch_module(self, parameter_vector: torch.Tensor) -> nn.Module:
        """
        Get a copy of the contained network, parameterized as specified.

        Args:
            parameter_vector: The parameters to be used by the new network.
        Returns:
            Copy of the contained network, as a `torch.nn.Module` instance.
        """
        with torch.no_grad():
            net = deepcopy(self.__module).to(parameter_vector.device)
            nnu.vector_to_parameters(parameter_vector, net.parameters())
        return net
h: Optional[torch.Tensor] property readonly

The hidden state of the contained recurrent network, if any.

If the contained recurrent network did not generate a hidden state yet, or if the contained network is not recurrent, then the result will be None.

parameter_length: int property readonly

Length of the parameter tensor.

parameters: Tensor property readonly

The currently used parameters.

wrapped_module: Module property readonly

The wrapped torch.nn.Module instance.

__call__(self, x) special

Pass the given observations through the network.

Parameters:

Name Type Description Default
x Tensor

The observations, as a PyTorch tensor. If the parameters were given (via the method set_parameters(...)) as a 1-dimensional tensor, then this argument is expected to store a single observation. If the parameters were given as a 2-dimensional tensor, then, this argument is expected to store a batch of observations, and the leftmost size of this observation tensor must match with the leftmost size of the parameter tensor.

required

Returns:

Type Description
Tensor

The output tensor, which represents the action to take.

Source code in evotorch/neuroevolution/net/vecrl.py
def __call__(self, x: torch.Tensor) -> torch.Tensor:
    """
    Pass the given observations through the network.

    Args:
        x: The observations, as a PyTorch tensor.
            If the parameters were given (via the method
            `set_parameters(...)`) as a 1-dimensional tensor, then this
            argument is expected to store a single observation.
            If the parameters were given as a 2-dimensional tensor,
            then, this argument is expected to store a batch of
            observations, and the leftmost size of this observation
            tensor must match with the leftmost size of the parameter
            tensor.
    Returns:
        The output tensor, which represents the action to take.
    """
    if self.__parameters is None:
        raise ValueError("Please use the method `set_parameters(...)` before calling the policy.")

    if self.__state is None:
        further_args = (x,)
    else:
        further_args = (x, self.__state)

    parameters = self.__parameters
    ndim = parameters.ndim
    if ndim == 1:
        result = self.__fmodule(parameters, *further_args)
    elif ndim == 2:
        vmapped = vmap(self.__fmodule)
        result = vmapped(parameters, *further_args)
    else:
        raise ValueError(
            f"Expected the parameters as a 1 or 2 dimensional tensor."
            f" However, the received parameters tensor has {ndim} dimensions."
        )

    if isinstance(result, torch.Tensor):
        return result
    elif isinstance(result, tuple):
        result, state = result
        self.__state = state
        return result
    else:
        raise TypeError(f"The torch module used by the Policy returned an unexpected object: {result}")
__init__(self, net, **kwargs) special

__init__(...): Initialize the Policy.

Parameters:

Name Type Description Default
net Union[str, Callable, torch.nn.modules.module.Module]

The network to be wrapped by the Policy object. This can be a string, a Callable (e.g. a torch.nn.Module subclass), or a torch.nn.Module instance. When this argument is a string, the network will be created with the help of the function evotorch.neuroevolution.net.str_to_net(...) and then wrapped. Please see the str_to_net(...) function's documentation for details regarding how a network structure can be expressed via strings.

required
kwargs

Expected in the form of additional keyword arguments, these keyword arguments will be passed to the provided Callable object (if the argument net is a Callable) or to str_to_net(...) (if the argument net is a string) at the moment of generating the network. If the argument net is a torch.nn.Module instance, having any additional keyword arguments will trigger an error, because the network is already instantiated and therefore, it is not possible to pass these keyword arguments.

{}
Source code in evotorch/neuroevolution/net/vecrl.py
def __init__(self, net: Union[str, Callable, nn.Module], **kwargs):
    """
    `__init__(...)`: Initialize the Policy.

    Args:
        net: The network to be wrapped by the Policy object.
            This can be a string, a Callable (e.g. a `torch.nn.Module`
            subclass), or a `torch.nn.Module` instance.
            When this argument is a string, the network will be
            created with the help of the function
            `evotorch.neuroevolution.net.str_to_net(...)` and then
            wrapped. Please see the `str_to_net(...)` function's
            documentation for details regarding how a network structure
            can be expressed via strings.
        kwargs: Expected in the form of additional keyword arguments,
            these keyword arguments will be passed to the provided
            Callable object (if the argument `net` is a Callable)
            or to `str_to_net(...)` (if the argument `net` is a string)
            at the moment of generating the network.
            If the argument `net` is a `torch.nn.Module` instance,
            having any additional keyword arguments will trigger an
            error, because the network is already instantiated and
            therefore, it is not possible to pass these keyword arguments.
    """
    from ..net import str_to_net
    from ..net.functional import ModuleExpectingFlatParameters, make_functional_module

    if isinstance(net, str):
        self.__module = str_to_net(net, **kwargs)
    elif isinstance(net, nn.Module):
        if len(kwargs) > 0:
            raise ValueError(
                f"When the network is given as an `nn.Module` instance, extra network arguments cannot be used"
                f" (because the network is already instantiated)."
                f" However, these extra keyword arguments were received: {kwargs}."
            )
        self.__module = net
    elif isinstance(net, Callable):
        self.__module = net(**kwargs)
    else:
        raise TypeError(
            f"The class `Policy` expected a string or an `nn.Module` instance, or a Callable, but received {net}"
            f" (whose type is {type(net)})."
        )

    self.__fmodule: ModuleExpectingFlatParameters = make_functional_module(self.__module)
    self.__state: Any = None
    self.__parameters: Optional[torch.Tensor] = None
reset(self, indices=None, *, copy=True)

Reset the hidden states, if the contained module is a recurrent network.

Parameters:

Name Type Description Default
indices Union[int, Iterable]

Optionally a sequence of integers or a sequence of booleans, specifying which networks' states will be reset. If left as None, then the states of all the networks will be reset.

None
copy bool

When indices is given as something other than None, if copy is given as True, then the resetting will NOT be done in-place. Instead, a new copy of the hidden state will first be created, and then the specified regions of this new copy will be cleared, and then finally this modified copy will be declared as the new hidden state. It is a common practice for recurrent neural network implementations to return the same tensor both as its output and as (part of) its hidden state. With copy=False, the resetting would be done in-place, and the action tensor could be involuntarily reset as well. This in-place modification could cause silent bugs if the unintended modification on the action tensor happens BEFORE the action is sent to the reinforcement learning environment. To prevent such situations, the default value for the argument copy is True.

True
Source code in evotorch/neuroevolution/net/vecrl.py
def reset(self, indices: Optional[MaskOrIndices] = None, *, copy: bool = True):
    """
    Reset the hidden states, if the contained module is a recurrent network.

    Args:
        indices: Optionally a sequence of integers or a sequence of
            booleans, specifying which networks' states will be
            reset. If left as None, then the states of all the networks
            will be reset.
        copy: When `indices` is given as something other than None,
            if `copy` is given as True, then the resetting will NOT
            be done in-place. Instead, a new copy of the hidden state
            will first be created, and then the specified regions
            of this new copy will be cleared, and then finally this
            modified copy will be declared as the new hidden state.
            It is a common practice for recurrent neural network
            implementations to return the same tensor both as its
            output and as (part of) its hidden state. With `copy=False`,
            the resetting would be done in-place, and the action
            tensor could be involuntarily reset as well.
            This in-place modification could cause silent bugs
            if the unintended modification on the action tensor
            happens BEFORE the action is sent to the reinforcement
            learning environment.
            To prevent such situations, the default value for the argument
            `copy` is True.
    """
    if indices is None:
        self.__state = None
    else:
        if self.__state is not None:
            with torch.no_grad():
                if copy:
                    self.__state = deepcopy(self.__state)
                reset_tensors(self.__state, indices)
set_parameters(self, parameters, indices=None, *, reset=True)

Set the parameters of the policy.

Parameters:

Name Type Description Default
parameters Tensor

A 1-dimensional or a 2-dimensional tensor containing the flattened parameters to be used with the neural network. If the given parameters are two-dimensional, then, given that the leftmost size of the parameter tensor is n, the observations will be expected in a batch with leftmost size n, and the returned actions will also be in a batch, again with the leftmost size n.

required
indices Union[int, Iterable]

For when the parameters were previously given via a 2-dimensional tensor, provide this argument if you would like to change only some rows of the previously given parameters. For example, if indices is given as torch.tensor([2, 4]) and the argument parameters is given as a 2-dimensional tensor with leftmost size 2, then the rows with indices 2 and 4 will be replaced by these new parameters provided via the argument parameters.

None
reset bool

If given as True, the hidden states of the networks whose parameters just changed will be reset. If indices was not provided at all, then this means that the parameters of all networks are modified, in which case, all the hidden states will be reset. If given as False, no such resetting will be done.

True
Source code in evotorch/neuroevolution/net/vecrl.py
def set_parameters(self, parameters: torch.Tensor, indices: Optional[MaskOrIndices] = None, *, reset: bool = True):
    """
    Set the parameters of the policy.

    Args:
        parameters: A 1-dimensional or a 2-dimensional tensor containing
            the flattened parameters to be used with the neural network.
            If the given parameters are two-dimensional, then, given that
            the leftmost size of the parameter tensor is `n`, the
            observations will be expected in a batch with leftmost size
            `n`, and the returned actions will also be in a batch,
            again with the leftmost size `n`.
        indices: For when the parameters were previously given via a
            2-dimensional tensor, provide this argument if you would like
            to change only some rows of the previously given parameters.
            For example, if `indices` is given as `torch.tensor([2, 4])`
            and the argument `parameters` is given as a 2-dimensional
            tensor with leftmost size 2, then the rows with indices
            2 and 4 will be replaced by these new parameters provided
            via the argument `parameters`.
        reset: If given as True, the hidden states of the networks whose
            parameters just changed will be reset. If `indices` was not
            provided at all, then this means that the parameters of all
            networks are modified, in which case, all the hidden states
            will be reset.
            If given as False, no such resetting will be done.
    """
    if self.__parameters is None:
        if indices is not None:
            raise ValueError(
                "The argument `indices` can be used only if network parameters were previously specified."
                " However, it seems that the method `set_parameters(...)` was not called before."
            )
        self.__parameters = parameters
    else:
        if indices is None:
            self.__parameters = parameters
        else:
            self.__parameters[indices] = parameters

    if reset:
        self.reset(indices)
to_torch_module(self, parameter_vector)

Get a copy of the contained network, parameterized as specified.

Parameters:

Name Type Description Default
parameter_vector Tensor

The parameters to be used by the new network.

required

Returns:

Type Description
Module

Copy of the contained network, as a torch.nn.Module instance.

Source code in evotorch/neuroevolution/net/vecrl.py
def to_torch_module(self, parameter_vector: torch.Tensor) -> nn.Module:
    """
    Get a copy of the contained network, parameterized as specified.

    Args:
        parameter_vector: The parameters to be used by the new network.
    Returns:
        Copy of the contained network, as a `torch.nn.Module` instance.
    """
    with torch.no_grad():
        net = deepcopy(self.__module).to(parameter_vector.device)
        nnu.vector_to_parameters(parameter_vector, net.parameters())
    return net

TorchWrapper (Wrapper)

A gym wrapper which ensures that the actions, observations, rewards, and the 'done' values are expressed as PyTorch tensors.

Source code in evotorch/neuroevolution/net/vecrl.py
class TorchWrapper(gym.Wrapper):
    """
    A gym wrapper which ensures that the actions, observations, rewards, and
    the 'done' values are expressed as PyTorch tensors.
    """

    def __init__(
        self,
        env: Union[gym.Env],
        *,
        force_classic_api: bool = False,
        discrete_to_continuous_act: bool = False,
        clip_actions: bool = False,
        **kwargs,
    ):
        """
        `__init__(...)`: Initialize the TorchWrapper.

        Args:
            env: The gym environment to be wrapped.
            force_classic_api: Set this as True if you would like to enable
                the classic API. In the classic API, the `reset(...)` method
                returns only the observation and the `step(...)` method
                returns 4 elements (not 5).
            discrete_to_continuous_act: When this is set as True and the
                wrapped environment has a Discrete action space, this wrapper
                will transform the action space to Box. A Discrete-action
                environment with `n` actions will be converted to a Box-action
                environment where the action length is `n`.
                The index of the largest value within the action vector will
                be applied to the underlying environment.
            clip_actions: Set this as True if you would like to clip the given
                actions so that they conform to the declared boundaries of the
                action space.
            kwargs: Expected in the form of additional keyword arguments.
                These additional keyword arguments are passed to the
                superclass.
        """
        super().__init__(env, **kwargs)

        # Declare the variable that will store the array type of the underlying environment.
        self.__array_type: Optional[str] = None

        if hasattr(env, "single_observation_space"):
            # If the underlying environment has the attribute "single_observation_space",
            # then this is a vectorized environment.
            self.__vectorized = True

            # Get the observation and action spaces.
            obs_space = env.single_observation_space
            act_space = env.single_action_space
        else:
            # If the underlying environment has the attribute "single_observation_space",
            # then this is a non-vectorized environment.
            self.__vectorized = False

            # Get the observation and action spaces.
            obs_space = env.observation_space
            act_space = env.action_space

        # Ensure that the observation and action spaces are supported.
        _must_be_supported_space(obs_space)
        _must_be_supported_space(act_space)

        # Store the choice of the user regarding "force_classic_api".
        self.__force_classic_api = bool(force_classic_api)

        if isinstance(act_space, Discrete) and discrete_to_continuous_act:
            # The underlying action space is Discrete and `discrete_to_continuous_act` is given as True.
            # Therefore, we convert the action space to continuous (to Box).

            # Take the shape and the dtype of the discrete action space.
            single_action_shape = (act_space.n,)
            single_action_dtype = torch.from_numpy(np.array([], dtype=act_space.dtype)).dtype

            # We store the integer dtype of the environment.
            self.__discrete_dtype = single_action_dtype

            if self.__vectorized:
                # If the environment is vectorized, we declare the new `action_space` and the `single_action_space`
                # for the enviornment.
                action_shape = (env.num_envs,) + single_action_shape
                self.single_action_space = Box(float("-inf"), float("inf"), shape=single_action_shape, dtype=np.float32)
                self.action_space = Box(float("-inf"), float("inf"), shape=action_shape, dtype=np.float32)
            else:
                # If the environment is not vectorized, we declare the new `action_space` for the environment.
                self.action_space = Box(float("-inf"), float("inf"), shape=single_action_shape, dtype=np.float32)
        else:
            # This is the case where we do not transform the action space.
            # The discrete dtype will not be used, so, we set it as None.
            self.__discrete_dtype = None

        if isinstance(act_space, Box) and clip_actions:
            # If the action space is Box and the wrapper is configured to clip the actions, then we store the lower
            # and the upper bounds for the actions.
            self.__act_lb = torch.from_numpy(act_space.low)
            self.__act_ub = torch.from_numpy(act_space.high)
        else:
            # If there will not be any action clipping, then we store the lower and the upper bounds as None.
            self.__act_lb = None
            self.__act_ub = None

    @property
    def array_type(self) -> Optional[str]:
        """
        Get the array type of the wrapped environment.
        This can be "jax", "torch", or "numpy".
        """
        return self.__array_type

    def __infer_array_type(self, observation):
        if self.__array_type is None:
            # If the array type is not determined yet, set it as the array type of the received observation.
            # If the observation has an unrecognized type, set the array type as "numpy".
            self.__array_type = array_type(observation, "numpy")

    def reset(self, *args, **kwargs):
        """Reset the environment"""

        # Call the reset method of the wrapped environment.
        reset_result = self.env.reset(*args, **kwargs)

        if isinstance(reset_result, tuple):
            # If we received a tuple of two elements, then we assume that this is the new gym API.
            # We note that we received an info dictionary.
            got_info = True
            # We keep the received observation and info.
            observation, info = reset_result
        else:
            # If we did not receive a tuple, then we assume that this is the old gym API.
            # We note that we did not receive an info dictionary.
            got_info = False
            # We keep the received observation.
            observation = reset_result
            # We did not receive an info dictionary, so, we set it as an empty dictionary.
            info = {}

        # We understand the array type of the underlying environment from the first observation.
        self.__infer_array_type(observation)

        # Convert the observation to a PyTorch tensor.
        observation = convert_to_torch(observation)

        if self.__force_classic_api:
            # If the option `force_classic_api` was set as True, then we only return the observation.
            return observation
        else:
            # Here we handle the case where `force_classic_api` was set as False.
            if got_info:
                # If we got an additional info dictionary, we return it next to the observation.
                return observation, info
            else:
                # If we did not get any info dictionary, we return only the observation.
                return observation

    def step(self, action, *args, **kwargs):
        """Take a step in the environment"""

        if self.__array_type is None:
            # If the array type is not known yet, then probably `reset()` has not been called yet.
            # We raise an error.
            raise ValueError(
                "Could not understand what type of array this environment works with."
                " Perhaps the `reset()` method has not been called yet?"
            )

        if self.__discrete_dtype is not None:
            # If the wrapped environment is discrete-actioned, then we take the integer counterpart of the action.
            action = torch.argmax(action, dim=-1).to(dtype=self.__discrete_dtype)

        if self.__act_lb is not None:
            # The internal variable `__act_lb` having a value other than None means that the initialization argument
            # `clip_actions` was given as True.
            # Therefore, we clip the actions.
            self.__act_lb = self.__act_lb.to(action.device)
            self.__act_ub = self.__act_ub.to(action.device)
            action = torch.max(action, self.__act_lb)
            action = torch.min(action, self.__act_ub)

        # Convert the action tensor to the expected array type of the underlying environment.
        action = convert_from_torch(action, self.__array_type)

        # Perform the step and get the result.
        result = self.env.step(action, *args, **kwargs)

        if not isinstance(result, tuple):
            # If the `step(...)` method returned anything other than tuple, we raise an error.
            raise TypeError(f"Expected a tuple as the result of the `step()` method, but received a {type(result)}")

        if len(result) == 5:
            # If the result is a tuple of 5 elements, then we note that we are using the new API.
            using_new_api = True
            # Take the observation, reward, two boolean variables done and done2 indicating that the episode(s)
            # has/have ended, and additional info.
            # `done` indicates whether or not the episode(s) reached terminal state(s).
            # `done2` indicates whether or not the episode(s) got truncated because of the timestep limit.
            observation, reward, done, done2, info = result
        elif len(result) == 4:
            # If the result is a tuple of 5 elements, then we note that we are not using the new API.
            using_new_api = False
            # Take the observation, reward, the done boolean flag, and additional info.
            observation, reward, done, info = result
            done2 = None
        else:
            raise ValueError(f"Unexpected number of elements were returned from step(): {len(result)}")

        # Convert the observation, reward, and done variables to PyTorch tensors.
        observation = convert_to_torch(observation)
        reward = convert_to_torch(reward)
        done = convert_to_torch_bool(done)
        if done2 is not None:
            done2 = convert_to_torch_bool(done2)

        if self.__force_classic_api:
            # This is the case where the initialization argument `force_classic_api` was set as True.
            if done2 is not None:
                # We combine the terminal state and truncation signals into a single boolean tensor indicating
                # whether or not the episode(s) ended.
                done = done | done2
            # Return 4 elements, compatible with the classic gym API.
            return observation, reward, done, info
        else:
            # This is the case where the initialization argument `force_classic_api` was set as False.
            if using_new_api:
                # If we are using the new API, then we return the 5-element result.
                return observation, reward, done, done2, info
            else:
                # If we are using the new API, then we return the 4-element result.
                return observation, reward, done, info
array_type: Optional[str] property readonly

Get the array type of the wrapped environment. This can be "jax", "torch", or "numpy".

__init__(self, env, *, force_classic_api=False, discrete_to_continuous_act=False, clip_actions=False, **kwargs) special

__init__(...): Initialize the TorchWrapper.

Parameters:

Name Type Description Default
env Env

The gym environment to be wrapped.

required
force_classic_api bool

Set this as True if you would like to enable the classic API. In the classic API, the reset(...) method returns only the observation and the step(...) method returns 4 elements (not 5).

False
discrete_to_continuous_act bool

When this is set as True and the wrapped environment has a Discrete action space, this wrapper will transform the action space to Box. A Discrete-action environment with n actions will be converted to a Box-action environment where the action length is n. The index of the largest value within the action vector will be applied to the underlying environment.

False
clip_actions bool

Set this as True if you would like to clip the given actions so that they conform to the declared boundaries of the action space.

False
kwargs

Expected in the form of additional keyword arguments. These additional keyword arguments are passed to the superclass.

{}
Source code in evotorch/neuroevolution/net/vecrl.py
def __init__(
    self,
    env: Union[gym.Env],
    *,
    force_classic_api: bool = False,
    discrete_to_continuous_act: bool = False,
    clip_actions: bool = False,
    **kwargs,
):
    """
    `__init__(...)`: Initialize the TorchWrapper.

    Args:
        env: The gym environment to be wrapped.
        force_classic_api: Set this as True if you would like to enable
            the classic API. In the classic API, the `reset(...)` method
            returns only the observation and the `step(...)` method
            returns 4 elements (not 5).
        discrete_to_continuous_act: When this is set as True and the
            wrapped environment has a Discrete action space, this wrapper
            will transform the action space to Box. A Discrete-action
            environment with `n` actions will be converted to a Box-action
            environment where the action length is `n`.
            The index of the largest value within the action vector will
            be applied to the underlying environment.
        clip_actions: Set this as True if you would like to clip the given
            actions so that they conform to the declared boundaries of the
            action space.
        kwargs: Expected in the form of additional keyword arguments.
            These additional keyword arguments are passed to the
            superclass.
    """
    super().__init__(env, **kwargs)

    # Declare the variable that will store the array type of the underlying environment.
    self.__array_type: Optional[str] = None

    if hasattr(env, "single_observation_space"):
        # If the underlying environment has the attribute "single_observation_space",
        # then this is a vectorized environment.
        self.__vectorized = True

        # Get the observation and action spaces.
        obs_space = env.single_observation_space
        act_space = env.single_action_space
    else:
        # If the underlying environment has the attribute "single_observation_space",
        # then this is a non-vectorized environment.
        self.__vectorized = False

        # Get the observation and action spaces.
        obs_space = env.observation_space
        act_space = env.action_space

    # Ensure that the observation and action spaces are supported.
    _must_be_supported_space(obs_space)
    _must_be_supported_space(act_space)

    # Store the choice of the user regarding "force_classic_api".
    self.__force_classic_api = bool(force_classic_api)

    if isinstance(act_space, Discrete) and discrete_to_continuous_act:
        # The underlying action space is Discrete and `discrete_to_continuous_act` is given as True.
        # Therefore, we convert the action space to continuous (to Box).

        # Take the shape and the dtype of the discrete action space.
        single_action_shape = (act_space.n,)
        single_action_dtype = torch.from_numpy(np.array([], dtype=act_space.dtype)).dtype

        # We store the integer dtype of the environment.
        self.__discrete_dtype = single_action_dtype

        if self.__vectorized:
            # If the environment is vectorized, we declare the new `action_space` and the `single_action_space`
            # for the enviornment.
            action_shape = (env.num_envs,) + single_action_shape
            self.single_action_space = Box(float("-inf"), float("inf"), shape=single_action_shape, dtype=np.float32)
            self.action_space = Box(float("-inf"), float("inf"), shape=action_shape, dtype=np.float32)
        else:
            # If the environment is not vectorized, we declare the new `action_space` for the environment.
            self.action_space = Box(float("-inf"), float("inf"), shape=single_action_shape, dtype=np.float32)
    else:
        # This is the case where we do not transform the action space.
        # The discrete dtype will not be used, so, we set it as None.
        self.__discrete_dtype = None

    if isinstance(act_space, Box) and clip_actions:
        # If the action space is Box and the wrapper is configured to clip the actions, then we store the lower
        # and the upper bounds for the actions.
        self.__act_lb = torch.from_numpy(act_space.low)
        self.__act_ub = torch.from_numpy(act_space.high)
    else:
        # If there will not be any action clipping, then we store the lower and the upper bounds as None.
        self.__act_lb = None
        self.__act_ub = None
reset(self, *args, **kwargs)

Reset the environment

Source code in evotorch/neuroevolution/net/vecrl.py
def reset(self, *args, **kwargs):
    """Reset the environment"""

    # Call the reset method of the wrapped environment.
    reset_result = self.env.reset(*args, **kwargs)

    if isinstance(reset_result, tuple):
        # If we received a tuple of two elements, then we assume that this is the new gym API.
        # We note that we received an info dictionary.
        got_info = True
        # We keep the received observation and info.
        observation, info = reset_result
    else:
        # If we did not receive a tuple, then we assume that this is the old gym API.
        # We note that we did not receive an info dictionary.
        got_info = False
        # We keep the received observation.
        observation = reset_result
        # We did not receive an info dictionary, so, we set it as an empty dictionary.
        info = {}

    # We understand the array type of the underlying environment from the first observation.
    self.__infer_array_type(observation)

    # Convert the observation to a PyTorch tensor.
    observation = convert_to_torch(observation)

    if self.__force_classic_api:
        # If the option `force_classic_api` was set as True, then we only return the observation.
        return observation
    else:
        # Here we handle the case where `force_classic_api` was set as False.
        if got_info:
            # If we got an additional info dictionary, we return it next to the observation.
            return observation, info
        else:
            # If we did not get any info dictionary, we return only the observation.
            return observation
step(self, action, *args, **kwargs)

Take a step in the environment

Source code in evotorch/neuroevolution/net/vecrl.py
def step(self, action, *args, **kwargs):
    """Take a step in the environment"""

    if self.__array_type is None:
        # If the array type is not known yet, then probably `reset()` has not been called yet.
        # We raise an error.
        raise ValueError(
            "Could not understand what type of array this environment works with."
            " Perhaps the `reset()` method has not been called yet?"
        )

    if self.__discrete_dtype is not None:
        # If the wrapped environment is discrete-actioned, then we take the integer counterpart of the action.
        action = torch.argmax(action, dim=-1).to(dtype=self.__discrete_dtype)

    if self.__act_lb is not None:
        # The internal variable `__act_lb` having a value other than None means that the initialization argument
        # `clip_actions` was given as True.
        # Therefore, we clip the actions.
        self.__act_lb = self.__act_lb.to(action.device)
        self.__act_ub = self.__act_ub.to(action.device)
        action = torch.max(action, self.__act_lb)
        action = torch.min(action, self.__act_ub)

    # Convert the action tensor to the expected array type of the underlying environment.
    action = convert_from_torch(action, self.__array_type)

    # Perform the step and get the result.
    result = self.env.step(action, *args, **kwargs)

    if not isinstance(result, tuple):
        # If the `step(...)` method returned anything other than tuple, we raise an error.
        raise TypeError(f"Expected a tuple as the result of the `step()` method, but received a {type(result)}")

    if len(result) == 5:
        # If the result is a tuple of 5 elements, then we note that we are using the new API.
        using_new_api = True
        # Take the observation, reward, two boolean variables done and done2 indicating that the episode(s)
        # has/have ended, and additional info.
        # `done` indicates whether or not the episode(s) reached terminal state(s).
        # `done2` indicates whether or not the episode(s) got truncated because of the timestep limit.
        observation, reward, done, done2, info = result
    elif len(result) == 4:
        # If the result is a tuple of 5 elements, then we note that we are not using the new API.
        using_new_api = False
        # Take the observation, reward, the done boolean flag, and additional info.
        observation, reward, done, info = result
        done2 = None
    else:
        raise ValueError(f"Unexpected number of elements were returned from step(): {len(result)}")

    # Convert the observation, reward, and done variables to PyTorch tensors.
    observation = convert_to_torch(observation)
    reward = convert_to_torch(reward)
    done = convert_to_torch_bool(done)
    if done2 is not None:
        done2 = convert_to_torch_bool(done2)

    if self.__force_classic_api:
        # This is the case where the initialization argument `force_classic_api` was set as True.
        if done2 is not None:
            # We combine the terminal state and truncation signals into a single boolean tensor indicating
            # whether or not the episode(s) ended.
            done = done | done2
        # Return 4 elements, compatible with the classic gym API.
        return observation, reward, done, info
    else:
        # This is the case where the initialization argument `force_classic_api` was set as False.
        if using_new_api:
            # If we are using the new API, then we return the 5-element result.
            return observation, reward, done, done2, info
        else:
            # If we are using the new API, then we return the 4-element result.
            return observation, reward, done, info

array_type(x, fallback=None)

Get the type of an array as a string ("jax", "torch", or "numpy"). If the type of the array cannot be determined and a fallback is provided, then the fallback value will be returned.

Parameters:

Name Type Description Default
x Any

The array whose type will be determined.

required
fallback Optional[str]

Fallback value, as a string, which will be returned if the array type cannot be determined.

None

Returns:

Type Description
str

The array type as a string ("jax", "torch", or "numpy").

Exceptions:

Type Description
TypeError

if the array type cannot be determined and a fallback value is not provided.

Source code in evotorch/neuroevolution/net/vecrl.py
def array_type(x: Any, fallback: Optional[str] = None) -> str:
    """
    Get the type of an array as a string ("jax", "torch", or "numpy").
    If the type of the array cannot be determined and a fallback is provided,
    then the fallback value will be returned.

    Args:
        x: The array whose type will be determined.
        fallback: Fallback value, as a string, which will be returned if the
            array type cannot be determined.
    Returns:
        The array type as a string ("jax", "torch", or "numpy").
    Raises:
        TypeError: if the array type cannot be determined and a fallback
            value is not provided.
    """
    if is_jax_array(x):
        return "jax"
    elif isinstance(x, torch.Tensor):
        return "torch"
    elif isinstance(x, np.ndarray):
        return "numpy"
    elif fallback is not None:
        return fallback
    else:
        raise TypeError(f"The object has an unrecognized type: {type(x)}")

convert_from_torch(x, array_type)

Convert the given PyTorch tensor to an array of the specified type.

Parameters:

Name Type Description Default
x Tensor

The PyTorch array that will be converted.

required
array_type str

Type to which the PyTorch tensor will be converted. Expected as one of these strings: "jax", "torch", "numpy".

required

Returns:

Type Description
Any

The array of the specified type. Can be a JAX array, a numpy array, or PyTorch tensor.

Exceptions:

Type Description
ValueError

if the array type cannot be determined.

Source code in evotorch/neuroevolution/net/vecrl.py
def convert_from_torch(x: torch.Tensor, array_type: str) -> Any:
    """
    Convert the given PyTorch tensor to an array of the specified type.

    Args:
        x: The PyTorch array that will be converted.
        array_type: Type to which the PyTorch tensor will be converted.
            Expected as one of these strings: "jax", "torch", "numpy".
    Returns:
        The array of the specified type. Can be a JAX array, a numpy array,
        or PyTorch tensor.
    Raises:
        ValueError: if the array type cannot be determined.
    """
    if array_type == "torch":
        return x
    elif array_type == "jax":
        return torch_to_jax(x)
    elif array_type == "numpy":
        return x.cpu().numpy()
    else:
        raise ValueError(f"Unrecognized array type: {array_type}")

convert_to_torch(x)

Convert the given array to PyTorch tensor.

Parameters:

Name Type Description Default
x Any

Array to be converted. Can be a JAX array, a numpy array, a PyTorch tensor (in which case the input tensor will be returned as it is) or any Iterable object.

required

Returns:

Type Description
Tensor

The PyTorch counterpart of the given array.

Source code in evotorch/neuroevolution/net/vecrl.py
def convert_to_torch(x: Any) -> torch.Tensor:
    """
    Convert the given array to PyTorch tensor.

    Args:
        x: Array to be converted. Can be a JAX array, a numpy array,
            a PyTorch tensor (in which case the input tensor will be
            returned as it is) or any Iterable object.
    Returns:
        The PyTorch counterpart of the given array.
    """
    if isinstance(x, torch.Tensor):
        return x
    elif is_jax_array(x):
        return jax_to_torch(x)
    elif isinstance(x, np.ndarray):
        return torch.from_numpy(x)
    else:
        return torch.as_tensor(x)

convert_to_torch_bool(x)

Convert the given array to a PyTorch tensor of bools.

If the given object is an array of floating point numbers, then, values that are near to 0.0 (with a tolerance of 1e-4) will be converted to False, and the others will be converted to True. If the given object is an array of integers, then zero values will be converted to False, and non-zero values will be converted to True. If the given object is an array of booleans, then no change will be made to those boolean values.

The given object can be a JAX array, a numpy array, or a PyTorch tensor. The result will always be a PyTorch tensor.

Parameters:

Name Type Description Default
x Any

Array to be converted.

required

Returns:

Type Description
Tensor

The array converted to a PyTorch tensor with its dtype set as bool.

Source code in evotorch/neuroevolution/net/vecrl.py
def convert_to_torch_bool(x: Any) -> torch.Tensor:
    """
    Convert the given array to a PyTorch tensor of bools.

    If the given object is an array of floating point numbers, then, values
    that are near to 0.0 (with a tolerance of 1e-4) will be converted to
    False, and the others will be converted to True.
    If the given object is an array of integers, then zero values will be
    converted to False, and non-zero values will be converted to True.
    If the given object is an array of booleans, then no change will be made
    to those boolean values.

    The given object can be a JAX array, a numpy array, or a PyTorch tensor.
    The result will always be a PyTorch tensor.

    Args:
        x: Array to be converted.
    Returns:
        The array converted to a PyTorch tensor with its dtype set as bool.
    """
    x = convert_to_torch(x)
    if x.dtype == torch.bool:
        pass  # nothing to do
    elif "float" in str(x.dtype):
        x = torch.abs(x) > 1e-4
    else:
        x = torch.as_tensor(x, dtype=torch.bool)

    return x

make_brax_env(env_name, *, force_classic_api=False, num_envs=None, discrete_to_continuous_act=False, clip_actions=False, **kwargs)

Make a brax environment and wrap it via TorchWrapper.

Parameters:

Name Type Description Default
env_name str

Name of the brax environment, as string (e.g. "humanoid"). If the string starts with "old::" (e.g. "old::humanoid", etc.), then the environment will be made using the namespace brax.v1 (which was introduced in brax version 0.9.0 where the updated implementations of the environments became default and the classical ones moved into brax.v1). You can use the prefix "old::" for reproducing previous results that were obtained or reported using an older version of brax.

required
force_classic_api bool

Whether or not the classic gym API is to be used.

False
num_envs Optional[int]

Batch size for the vectorized environment.

None
discrete_to_continuous_act bool

Whether or not the the discrete action space of the environment is to be converted to a continuous one. This does nothing if the environment's action space is not discrete.

False
clip_actions bool

Whether or not the actions should be explicitly clipped so that they stay within the declared action boundaries.

False
kwargs

Expected in the form of additional keyword arguments, these are passed to the environment.

{}

Returns:

Type Description
TorchWrapper

The brax environment, wrapped by TorchWrapper.

Source code in evotorch/neuroevolution/net/vecrl.py
def make_brax_env(
    env_name: str,
    *,
    force_classic_api: bool = False,
    num_envs: Optional[int] = None,
    discrete_to_continuous_act: bool = False,
    clip_actions: bool = False,
    **kwargs,
) -> TorchWrapper:
    """
    Make a brax environment and wrap it via TorchWrapper.

    Args:
        env_name: Name of the brax environment, as string (e.g. "humanoid").
            If the string starts with "old::" (e.g. "old::humanoid", etc.),
            then the environment will be made using the namespace `brax.v1`
            (which was introduced in brax version 0.9.0 where the updated
            implementations of the environments became default and the classical
            ones moved into `brax.v1`).
            You can use the prefix "old::" for reproducing previous results
            that were obtained or reported using an older version of brax.
        force_classic_api: Whether or not the classic gym API is to be used.
        num_envs: Batch size for the vectorized environment.
        discrete_to_continuous_act: Whether or not the the discrete action
            space of the environment is to be converted to a continuous one.
            This does nothing if the environment's action space is not
            discrete.
        clip_actions: Whether or not the actions should be explicitly clipped
            so that they stay within the declared action boundaries.
        kwargs: Expected in the form of additional keyword arguments, these
            are passed to the environment.
    Returns:
        The brax environment, wrapped by TorchWrapper.
    """

    if brax is not None:
        config = {}
        config.update(kwargs)
        if num_envs is not None:
            config["num_envs"] = num_envs
        env = VectorEnvFromBrax(env_name, **config)
        env = TorchWrapper(
            env,
            force_classic_api=force_classic_api,
            discrete_to_continuous_act=discrete_to_continuous_act,
            clip_actions=clip_actions,
        )
        return env
    else:
        _brax_is_missing()

make_gym_env(env_name, *, force_classic_api=False, num_envs=None, discrete_to_continuous_act=False, clip_actions=False, **kwargs)

Make gymnasium environments and wrap them via SyncVectorEnv and TorchWrapper.

Parameters:

Name Type Description Default
env_name str

Name of the gymnasium environment, as string (e.g. "Humanoid-v4").

required
force_classic_api bool

Whether or not the classic gym API is to be used.

False
num_envs Optional[int]

Batch size for the vectorized environment.

None
discrete_to_continuous_act bool

Whether or not the the discrete action space of the environment is to be converted to a continuous one. This does nothing if the environment's action space is not discrete.

False
clip_actions bool

Whether or not the actions should be explicitly clipped so that they stay within the declared action boundaries.

False
kwargs

Expected in the form of additional keyword arguments, these are passed to the environment.

{}

Returns:

Type Description
TorchWrapper

The gymnasium environments, wrapped by a TorchWrapper.

Source code in evotorch/neuroevolution/net/vecrl.py
def make_gym_env(
    env_name: str,
    *,
    force_classic_api: bool = False,
    num_envs: Optional[int] = None,
    discrete_to_continuous_act: bool = False,
    clip_actions: bool = False,
    **kwargs,
) -> TorchWrapper:
    """
    Make gymnasium environments and wrap them via SyncVectorEnv and TorchWrapper.

    Args:
        env_name: Name of the gymnasium environment, as string (e.g. "Humanoid-v4").
        force_classic_api: Whether or not the classic gym API is to be used.
        num_envs: Batch size for the vectorized environment.
        discrete_to_continuous_act: Whether or not the the discrete action
            space of the environment is to be converted to a continuous one.
            This does nothing if the environment's action space is not
            discrete.
        clip_actions: Whether or not the actions should be explicitly clipped
            so that they stay within the declared action boundaries.
        kwargs: Expected in the form of additional keyword arguments, these
            are passed to the environment.
    Returns:
        The gymnasium environments, wrapped by a TorchWrapper.
    """

    def make_the_env():
        return gym.make(env_name, **kwargs)

    env_fns = [make_the_env for _ in range(num_envs)]
    vec_env = TorchWrapper(
        SyncVectorEnv(env_fns),
        force_classic_api=force_classic_api,
        discrete_to_continuous_act=discrete_to_continuous_act,
        clip_actions=clip_actions,
    )

    return vec_env

make_vector_env(env_name, *, force_classic_api=False, num_envs=None, discrete_to_continuous_act=False, clip_actions=False, **kwargs)

Make a new vectorized environment and wrap it via TorchWrapper.

Parameters:

Name Type Description Default
env_name str

Name of the environment, as string. If the string starts with "gym::" (e.g. "gym::Humanoid-v4", etc.), then it is assumed that the target environment is a traditional non-vectorized gymnasium environment. This non-vectorized will first be duplicated and wrapped via a SyncVectorEnv so that it gains a vectorized interface, and then, it will be wrapped via TorchWrapper. If the string starts with "brax::" (e.g. "brax::humanoid", etc.), then it is assumed that the target environment is a brax environment which will be wrapped via TorchWrapper. If the string starts with "brax::old::" (e.g. "brax::old::humanoid", etc.), then the environment will be made using the namespace brax.v1 (which was introduced in brax version 0.9.0 where the updated implementations of the environments became default and the classical ones moved into brax.v1). You can use the prefix "brax::old::" for reproducing previous results that were obtained or reported using an older version of brax. If the string does not contain "::" at all (e.g. "Humanoid-v4"), then it is assumed that the target environment is a gymnasium environment. Therefore, "gym::Humanoid-v4" and "Humanoid-v4" are equivalent.

required
force_classic_api bool

Whether or not the classic gym API is to be used.

False
num_envs Optional[int]

Batch size for the vectorized environment.

None
discrete_to_continuous_act bool

Whether or not the the discrete action space of the environment is to be converted to a continuous one. This does nothing if the environment's action space is not discrete.

False
clip_actions bool

Whether or not the actions should be explicitly clipped so that they stay within the declared action boundaries.

False
kwargs

Expected in the form of additional keyword arguments, these are passed to the environment.

{}

Returns:

Type Description
TorchWrapper

The vectorized gymnasium environment, wrapped by TorchWrapper.

Source code in evotorch/neuroevolution/net/vecrl.py
def make_vector_env(
    env_name: str,
    *,
    force_classic_api: bool = False,
    num_envs: Optional[int] = None,
    discrete_to_continuous_act: bool = False,
    clip_actions: bool = False,
    **kwargs,
) -> TorchWrapper:
    """
    Make a new vectorized environment and wrap it via TorchWrapper.

    Args:
        env_name: Name of the environment, as string.
            If the string starts with "gym::" (e.g. "gym::Humanoid-v4", etc.),
            then it is assumed that the target environment is a traditional
            non-vectorized gymnasium environment. This non-vectorized
            will first be duplicated and wrapped via a `SyncVectorEnv` so that
            it gains a vectorized interface, and then, it will be wrapped via
            `TorchWrapper`.
            If the string starts with "brax::" (e.g. "brax::humanoid", etc.),
            then it is assumed that the target environment is a brax
            environment which will be wrapped via TorchWrapper.
            If the string starts with "brax::old::" (e.g.
            "brax::old::humanoid", etc.), then the environment will be made
            using the namespace `brax.v1` (which was introduced in brax
            version 0.9.0 where the updated implementations of the environments
            became default and the classical ones moved into `brax.v1`).
            You can use the prefix "brax::old::" for reproducing previous
            results that were obtained or reported using an older version of
            brax.
            If the string does not contain "::" at all (e.g. "Humanoid-v4"),
            then it is assumed that the target environment is a gymnasium
            environment. Therefore, "gym::Humanoid-v4" and "Humanoid-v4"
            are equivalent.
        force_classic_api: Whether or not the classic gym API is to be used.
        num_envs: Batch size for the vectorized environment.
        discrete_to_continuous_act: Whether or not the the discrete action
            space of the environment is to be converted to a continuous one.
            This does nothing if the environment's action space is not
            discrete.
        clip_actions: Whether or not the actions should be explicitly clipped
            so that they stay within the declared action boundaries.
        kwargs: Expected in the form of additional keyword arguments, these
            are passed to the environment.
    Returns:
        The vectorized gymnasium environment, wrapped by TorchWrapper.
    """

    env_parts = str(env_name).split("::", maxsplit=1)

    if len(env_parts) == 0:
        raise ValueError(f"Invalid value for `env_name`: {repr(env_name)}")
    elif len(env_parts) == 1:
        fn = make_gym_env
    elif len(env_parts) == 2:
        env_name = env_parts[1]
        if env_parts[0] == "gym":
            fn = make_gym_env
        elif env_parts[0] == "brax":
            fn = make_brax_env
        else:
            invalid_value = env_parts[0] + "::"
            raise ValueError(
                f"The argument `env_name` starts with {repr(invalid_value)}, implying that the environment is stored"
                f" in a registry named {repr(env_parts[0])}."
                f" However, the registry {repr(env_parts[0])} is not recognized."
                f" Supported environment registries are: 'gym', 'brax'."
            )
    else:
        assert False, "Unexpected value received from len(env_parts)"

    return fn(
        env_name,
        force_classic_api=force_classic_api,
        num_envs=num_envs,
        discrete_to_continuous_act=discrete_to_continuous_act,
        clip_actions=clip_actions,
        **kwargs,
    )

reset_tensors(x, indices)

Reset the specified regions of the given tensor(s) as 0.

Note that the resetting is performed in-place, which means, the provided tensors are modified.

The regions are determined by the argument indices, which can be a sequence of booleans (in which case it is interpreted as a mask), or a sequence of integers (in which case it is interpreted as the list of indices).

For example, let us imagine that we have the following tensor:

import torch

x = torch.tensor(
    [
        [0, 1, 2, 3],
        [4, 5, 6, 7],
        [8, 9, 10, 11],
        [12, 13, 14, 15],
    ],
    dtype=torch.float32,
)

If we wish to reset the rows with indices 0 and 2, we could use:

reset_tensors(x, [0, 2])

The new value of x would then be:

torch.tensor(
    [
        [0, 0, 0, 0],
        [4, 5, 6, 7],
        [0, 0, 0, 0],
        [12, 13, 14, 15],
    ],
    dtype=torch.float32,
)

The first argument does not have to be a single tensor. Instead, it can be a container (i.e. a dictionary-like object or an iterable) that stores tensors. In this case, each tensor stored by the container will be subject to resetting. In more details, each tensor within the iterable(s) and each tensor within the value part of the dictionary-like object(s) will be reset.

As an example, let us assume that we have the following collection:

a = torch.tensor(
    [
        [0, 1],
        [2, 3],
        [4, 5],
    ],
    dtype=torch.float32,
)

b = torch.tensor(
    [
        [0, 10, 20],
        [30, 40, 50],
        [60, 70, 80],
    ],
    dtype=torch.float32,
)

c = torch.tensor(
    [
        [100],
        [200],
        [300],
    ],
    dtype=torch.float32,
)

d = torch.tensor([-1, -2, -3], dtype=torch.float32)

my_tensors = [a, {"1": b, "2": (c, d)}]

To clear the regions with indices, e.g, (1, 2), we could do:

reset_tensors(my_tensors, [1, 2])

and the result would be:

>>> print(a)
torch.tensor(
    [
        [0, 1],
        [0, 0],
        [0, 0],
    ],
    dtype=torch.float32,
)

>>> print(b)
torch.tensor(
    [
        [0, 10, 20],
        [0, 0, 0],
        [0, 0, 0],
    ],
    dtype=torch.float32,
)

>>> print(c)
c = torch.tensor(
    [
        [100],
        [0],
        [0],
    ],
    dtype=torch.float32,
)

>>> print(d)
torch.tensor([-1, 0, 0], dtype=torch.float32)

Parameters:

Name Type Description Default
x Any

A tensor or a collection of tensors, whose values are subject to resetting.

required
indices Union[int, Iterable]

A sequence of integers or booleans, specifying which regions of the tensor(s) will be reset.

required
Source code in evotorch/neuroevolution/net/vecrl.py
def reset_tensors(x: Any, indices: MaskOrIndices):
    """
    Reset the specified regions of the given tensor(s) as 0.

    Note that the resetting is performed in-place, which means, the provided tensors are modified.

    The regions are determined by the argument `indices`, which can be a sequence of booleans (in which case it is
    interpreted as a mask), or a sequence of integers (in which case it is interpreted as the list of indices).

    For example, let us imagine that we have the following tensor:

    ```python
    import torch

    x = torch.tensor(
        [
            [0, 1, 2, 3],
            [4, 5, 6, 7],
            [8, 9, 10, 11],
            [12, 13, 14, 15],
        ],
        dtype=torch.float32,
    )
    ```

    If we wish to reset the rows with indices 0 and 2, we could use:

    ```python
    reset_tensors(x, [0, 2])
    ```

    The new value of `x` would then be:

    ```
    torch.tensor(
        [
            [0, 0, 0, 0],
            [4, 5, 6, 7],
            [0, 0, 0, 0],
            [12, 13, 14, 15],
        ],
        dtype=torch.float32,
    )
    ```

    The first argument does not have to be a single tensor.
    Instead, it can be a container (i.e. a dictionary-like object or an iterable) that stores tensors.
    In this case, each tensor stored by the container will be subject to resetting.
    In more details, each tensor within the iterable(s) and each tensor within the value part of the dictionary-like
    object(s) will be reset.

    As an example, let us assume that we have the following collection:

    ```python
    a = torch.tensor(
        [
            [0, 1],
            [2, 3],
            [4, 5],
        ],
        dtype=torch.float32,
    )

    b = torch.tensor(
        [
            [0, 10, 20],
            [30, 40, 50],
            [60, 70, 80],
        ],
        dtype=torch.float32,
    )

    c = torch.tensor(
        [
            [100],
            [200],
            [300],
        ],
        dtype=torch.float32,
    )

    d = torch.tensor([-1, -2, -3], dtype=torch.float32)

    my_tensors = [a, {"1": b, "2": (c, d)}]
    ```

    To clear the regions with indices, e.g, (1, 2), we could do:

    ```python
    reset_tensors(my_tensors, [1, 2])
    ```

    and the result would be:

    ```
    >>> print(a)
    torch.tensor(
        [
            [0, 1],
            [0, 0],
            [0, 0],
        ],
        dtype=torch.float32,
    )

    >>> print(b)
    torch.tensor(
        [
            [0, 10, 20],
            [0, 0, 0],
            [0, 0, 0],
        ],
        dtype=torch.float32,
    )

    >>> print(c)
    c = torch.tensor(
        [
            [100],
            [0],
            [0],
        ],
        dtype=torch.float32,
    )

    >>> print(d)
    torch.tensor([-1, 0, 0], dtype=torch.float32)
    ```

    Args:
        x: A tensor or a collection of tensors, whose values are subject to resetting.
        indices: A sequence of integers or booleans, specifying which regions of the tensor(s) will be reset.
    """
    if isinstance(x, torch.Tensor):
        # If the first argument is a tensor, then we clear it according to the indices we received.
        x[indices] = 0
    elif isinstance(x, (str, bytes, bytearray)):
        # str, bytes, and bytearray are the types of `Iterable` that we do not wish to process.
        # Therefore, we explicitly add a condition for them here, and explicitly state that nothing should be done
        # when instances of them are encountered.
        pass
    elif isinstance(x, Mapping):
        # If the first argument is a Mapping (i.e. a dictionary-like object), then, for each value part of the
        # Mapping instance, we call this function itself.
        for key, value in x.items():
            reset_tensors(value, indices)
    elif isinstance(x, Iterable):
        # If the first argument is an Iterable (e.g. a list, a tuple, etc.), then, for each value contained by this
        # Iterable instance, we call this function itself.
        for value in x:
            reset_tensors(value, indices)

supervisedne

SupervisedNE (NEProblem)

Representation of a neuro-evolution problem where the goal is to minimize a loss function in a supervised learning setting.

A supervised learning problem can be defined via subclassing this class and overriding the methods _loss(y_hat, y) (which is to define how the loss is computed) and _make_dataloader() (which is to define how a new DataLoader is created).

Alternatively, this class can be directly instantiated as follows:

def my_loss_function(output_of_network, desired_output):
    loss = ...  # compute the loss here
    return loss


problem = SupervisedNE(
    my_dataset, MyTorchModuleClass, my_loss_function, minibatch_size=..., ...
)
Source code in evotorch/neuroevolution/supervisedne.py
class SupervisedNE(NEProblem):
    """
    Representation of a neuro-evolution problem where the goal is to minimize
    a loss function in a supervised learning setting.

    A supervised learning problem can be defined via subclassing this class
    and overriding the methods
    `_loss(y_hat, y)` (which is to define how the loss is computed)
    and `_make_dataloader()` (which is to define how a new DataLoader is
    created).

    Alternatively, this class can be directly instantiated as follows:

    ```python
    def my_loss_function(output_of_network, desired_output):
        loss = ...  # compute the loss here
        return loss


    problem = SupervisedNE(
        my_dataset, MyTorchModuleClass, my_loss_function, minibatch_size=..., ...
    )
    ```
    """

    def __init__(
        self,
        dataset: Dataset,
        network: Union[str, nn.Module, Callable[[], nn.Module]],
        loss_func: Optional[Callable] = None,
        *,
        network_args: Optional[dict] = None,
        initial_bounds: Optional[BoundsPairLike] = (-0.00001, 0.00001),
        minibatch_size: Optional[int] = None,
        num_minibatches: Optional[int] = None,
        num_actors: Optional[Union[int, str]] = None,
        common_minibatch: bool = True,
        num_gpus_per_actor: Optional[Union[int, float, str]] = None,
        actor_config: Optional[dict] = None,
        num_subbatches: Optional[int] = None,
        subbatch_size: Optional[int] = None,
        device: Optional[Device] = None,
    ):
        """
        `__init__(...)`: Initialize the SupervisedNE.

        Args:
            dataset: The Dataset from which the minibatches will be pulled
            network: A network structure string, or a Callable (which can be
                a class inheriting from `torch.nn.Module`, or a function
                which returns a `torch.nn.Module` instance), or an instance
                of `torch.nn.Module`.
                The object provided here determines the structure of the
                neural network whose parameters will be evolved.
                A network structure string is a string which can be processed
                by `evotorch.neuroevolution.net.str_to_net(...)`.
                Please see the documentation of the function
                `evotorch.neuroevolution.net.str_to_net(...)` to see how such
                a neural network structure string looks like.
            loss_func: Optionally a function (or a Callable object) which
                receives `y_hat` (the output generated by the neural network)
                and `y` (the desired output), and returns the loss as a
                scalar.
                This argument can also be left as None, in which case it will
                be expected that the method `_loss(self, y_hat, y)` is
                overridden by the inheriting class.
            network_args: Optionally a dict-like object, storing keyword
                arguments to be passed to the network while instantiating it.
            initial_bounds: Specifies an interval from which the values of the
                initial neural network parameters will be drawn.
            minibatch_size: Optionally an integer, describing the size of a
                minibatch when pulling data from the dataset.
                Can also be left as None, in which case it will be expected
                that the inheriting class overrides the method
                `_make_dataloader()` and defines how a new DataLoader is to be
                made.
            num_minibatches: An integer, specifying over how many minibatches
                will a single neural network be evaluated.
                If not specified, it will be assumed that the desired number
                of minibatches per network evaluation is 1.
            num_actors: Number of actors to create for parallelized
                evaluation of the solutions.
                Certain string values are also accepted.
                When given as "max" or as "num_cpus", the number of actors
                will be equal to the number of all available CPUs in the ray
                cluster.
                When given as "num_gpus", the number of actors will be
                equal to the number of all available GPUs in the ray
                cluster, and each actor will be assigned a GPU.
                When given as "num_devices", the number of actors will be
                equal to the minimum among the number of CPUs and the number
                of GPUs available in the cluster (or will be equal to the
                number of CPUs if there is no GPU), and each actor will be
                assigned a GPU (if available).
                If `num_actors` is given as "num_gpus" or "num_devices",
                the argument `num_gpus_per_actor` must not be used,
                and the `actor_config` dictionary must not contain the
                key "num_gpus".
                If `num_actors` is given as something other than "num_gpus"
                or "num_devices", and if you wish to assign GPUs to each
                actor, then please see the argument `num_gpus_per_actor`.
            common_minibatch: Whether the same minibatches will be
                used when evaluating the solutions or not.
            actor_config: A dictionary, representing the keyword arguments
                to be passed to the options(...) used when creating the
                ray actor objects. To be used for explicitly allocating
                resources per each actor.
                For example, for declaring that each actor is to use a GPU,
                one can pass `actor_config=dict(num_gpus=1)`.
                Can also be given as None (which is the default),
                if no such options are to be passed.
            num_gpus_per_actor: Number of GPUs to be allocated by each
                remote actor.
                The default behavior is to NOT allocate any GPU at all
                (which is the default behavior of the ray library as well).
                When given as a number `n`, each actor will be given
                `n` GPUs (where `n` can be an integer, or can be a `float`
                for fractional allocation).
                When given as a string "max", then the available GPUs
                across the entire ray cluster (or within the local computer
                in the simplest cases) will be equally distributed among
                the actors.
                When given as a string "all", then each actor will have
                access to all the GPUs (this will be achieved by suppressing
                the environment variable `CUDA_VISIBLE_DEVICES` for each
                actor).
                When the problem is not distributed (i.e. when there are
                no actors), this argument is expected to be left as None.
            num_subbatches: If `num_subbatches` is None (assuming that
                `subbatch_size` is also None), then, when evaluating a
                population, the population will be split into n pieces, `n`
                being the number of actors, and each actor will evaluate
                its assigned piece. If `num_subbatches` is an integer `m`,
                then the population will be split into `m` pieces,
                and actors will continually accept the next unevaluated
                piece as they finish their current tasks.
                The arguments `num_subbatches` and `subbatch_size` cannot
                be given values other than None at the same time.
                While using a distributed algorithm, this argument determines
                how many sub-batches will be generated, and therefore,
                how many gradients will be computed by the remote actors.
            subbatch_size: If `subbatch_size` is None (assuming that
                `num_subbatches` is also None), then, when evaluating a
                population, the population will be split into `n` pieces, `n`
                being the number of actors, and each actor will evaluate its
                assigned piece. If `subbatch_size` is an integer `m`,
                then the population will be split into pieces of size `m`,
                and actors will continually accept the next unevaluated
                piece as they finish their current tasks.
                When there can be significant difference across the solutions
                in terms of computational requirements, specifying a
                `subbatch_size` can be beneficial, because, while one
                actor is busy with a subbatch containing computationally
                challenging solutions, other actors can accept more
                tasks and save time.
                The arguments `num_subbatches` and `subbatch_size` cannot
                be given values other than None at the same time.
                While using a distributed algorithm, this argument determines
                the size of a sub-batch (or sub-population) sampled by a
                remote actor for computing a gradient.
                In distributed mode, it is expected that the population size
                is divisible by `subbatch_size`.
            device: Default device in which a new population will be generated
                and the neural networks will operate.
                If not specified, "cpu" will be used.
        """
        super().__init__(
            objective_sense="min",
            network=network,
            network_args=network_args,
            initial_bounds=initial_bounds,
            num_actors=num_actors,
            num_gpus_per_actor=num_gpus_per_actor,
            actor_config=actor_config,
            num_subbatches=num_subbatches,
            subbatch_size=subbatch_size,
            device=device,
        )

        self.dataset = dataset
        self.dataloader: DataLoader = None
        self.dataloader_iterator = None

        self._loss_func = loss_func
        self._minibatch_size = None if minibatch_size is None else int(minibatch_size)
        self._num_minibatches = 1 if num_minibatches is None else int(num_minibatches)
        self._common_minibatch = common_minibatch
        self._current_minibatches: Optional[list] = None

    def _make_dataloader(self) -> DataLoader:
        """
        Make a new DataLoader.

        This method, in its default state, does not contain an implementation.
        In the case where the `__init__` of `SupervisedNE` is not provided
        with a minibatch size, it will be expected that this method is
        overridden by the inheriting class and that the operation of creating
        a new DataLoader is defined here.

        Returns:
            The new DataLoader.
        """
        raise NotImplementedError

    def make_dataloader(self) -> DataLoader:
        """
        Make a new DataLoader.

        If the `__init__` of `SupervisedNE` was provided with a minibatch size
        via the argument `minibatch_size`, then a new DataLoader will be made
        with that minibatch size.
        Otherwise, it will be expected that the method `_make_dataloader(...)`
        was overridden to contain details regarding how the DataLoader should be
        created, and that method will be executed.

        Returns:
            The created DataLoader.
        """
        if self._minibatch_size is None:
            return self._make_dataloader()
        else:
            return DataLoader(self.dataset, shuffle=True, batch_size=self._minibatch_size)

    def _evaluate_using_minibatch(self, network: nn.Module, batch: Any) -> Union[float, torch.Tensor]:
        """
        Pass a minibatch through a network, and compute the loss.

        Args:
            network: The network using which the loss will be computed.
            batch: The minibatch that will be used as data.
        Returns:
            The loss.
        """
        with torch.no_grad():
            x, y = batch
            yhat = network(x)
            return self.loss(yhat, y)

    def _loss(self, y_hat: Any, y: Any) -> Union[float, torch.Tensor]:
        """
        The loss function.

        This method, in its default state, does not contain an implementation.
        In the case where `__init__` of `SupervisedNE` class was not given
        a loss function via the argument `loss_func`, it will be expected
        that this method is overridden by the inheriting class and that the
        operation of computing the loss is defined here.

        Args:
            y_hat: The output estimated by the network
            y: The desired output
        Returns:
            A scalar, representing the loss
        """
        raise NotImplementedError

    def loss(self, y_hat: Any, y: Any) -> Union[float, torch.Tensor]:
        """
        Run the loss function and return the loss.

        If the `__init__` of `SupervisedNE` class was given a loss
        function via the argument `loss_func`, then that loss function
        will be used. Otherwise, it will be expected that the method
        `_loss(...)` is overriden with a loss definition, and that method
        will be used to compute the loss.
        The computed loss will be returned.

        Args:
            y_hat: The output estimated by the network
            y: The desired output
        Returns:
            A scalar, representing the loss
        """
        if self._loss_func is None:
            return self._loss(y_hat, y)
        else:
            return self._loss_func(y_hat, y)

    def _prepare(self) -> None:
        self.dataloader = self.make_dataloader()

    def get_minibatch(self) -> Any:
        """
        Get the next minibatch from the DataLoader.
        """
        if self.dataloader is None:
            self._prepare()

        if self.dataloader_iterator is None:
            self.dataloader_iterator = iter(self.dataloader)

        batch = None
        try:
            batch = next(self.dataloader_iterator)
        except StopIteration:
            pass

        if batch is None:
            self.dataloader_iterator = iter(self.dataloader)
            batch = next(self.dataloader_iterator)

        # Move batch to device of network
        return [var.to(self.network_device) for var in batch]

    def _evaluate_network(self, network: nn.Module) -> torch.Tensor:
        loss = 0.0
        for batch_idx in range(self._num_minibatches):
            if not self._common_minibatch:
                self._current_minibatch = self.get_minibatch()
            else:
                self._current_minibatch = self._current_minibatches[batch_idx]
            loss += self._evaluate_using_minibatch(network, self._current_minibatch) / self._num_minibatches
        return loss

    def _evaluate_batch(self, batch: SolutionBatch):
        if self._common_minibatch:
            # If using a common data batch, generate them now and use them for the entire batch of solutions
            self._current_minibatches = [self.get_minibatch() for _ in range(self._num_minibatches)]
        return super()._evaluate_batch(batch)

__init__(self, dataset, network, loss_func=None, *, network_args=None, initial_bounds=(-1e-05, 1e-05), minibatch_size=None, num_minibatches=None, num_actors=None, common_minibatch=True, num_gpus_per_actor=None, actor_config=None, num_subbatches=None, subbatch_size=None, device=None) special

__init__(...): Initialize the SupervisedNE.

Parameters:

Name Type Description Default
dataset Dataset

The Dataset from which the minibatches will be pulled

required
network Union[str, torch.nn.modules.module.Module, Callable[[], torch.nn.modules.module.Module]]

A network structure string, or a Callable (which can be a class inheriting from torch.nn.Module, or a function which returns a torch.nn.Module instance), or an instance of torch.nn.Module. The object provided here determines the structure of the neural network whose parameters will be evolved. A network structure string is a string which can be processed by evotorch.neuroevolution.net.str_to_net(...). Please see the documentation of the function evotorch.neuroevolution.net.str_to_net(...) to see how such a neural network structure string looks like.

required
loss_func Optional[Callable]

Optionally a function (or a Callable object) which receives y_hat (the output generated by the neural network) and y (the desired output), and returns the loss as a scalar. This argument can also be left as None, in which case it will be expected that the method _loss(self, y_hat, y) is overridden by the inheriting class.

None
network_args Optional[dict]

Optionally a dict-like object, storing keyword arguments to be passed to the network while instantiating it.

None
initial_bounds Union[Iterable[Union[float, Iterable[float], torch.Tensor]], evotorch.core.BoundsPair]

Specifies an interval from which the values of the initial neural network parameters will be drawn.

(-1e-05, 1e-05)
minibatch_size Optional[int]

Optionally an integer, describing the size of a minibatch when pulling data from the dataset. Can also be left as None, in which case it will be expected that the inheriting class overrides the method _make_dataloader() and defines how a new DataLoader is to be made.

None
num_minibatches Optional[int]

An integer, specifying over how many minibatches will a single neural network be evaluated. If not specified, it will be assumed that the desired number of minibatches per network evaluation is 1.

None
num_actors Union[int, str]

Number of actors to create for parallelized evaluation of the solutions. Certain string values are also accepted. When given as "max" or as "num_cpus", the number of actors will be equal to the number of all available CPUs in the ray cluster. When given as "num_gpus", the number of actors will be equal to the number of all available GPUs in the ray cluster, and each actor will be assigned a GPU. When given as "num_devices", the number of actors will be equal to the minimum among the number of CPUs and the number of GPUs available in the cluster (or will be equal to the number of CPUs if there is no GPU), and each actor will be assigned a GPU (if available). If num_actors is given as "num_gpus" or "num_devices", the argument num_gpus_per_actor must not be used, and the actor_config dictionary must not contain the key "num_gpus". If num_actors is given as something other than "num_gpus" or "num_devices", and if you wish to assign GPUs to each actor, then please see the argument num_gpus_per_actor.

None
common_minibatch bool

Whether the same minibatches will be used when evaluating the solutions or not.

True
actor_config Optional[dict]

A dictionary, representing the keyword arguments to be passed to the options(...) used when creating the ray actor objects. To be used for explicitly allocating resources per each actor. For example, for declaring that each actor is to use a GPU, one can pass actor_config=dict(num_gpus=1). Can also be given as None (which is the default), if no such options are to be passed.

None
num_gpus_per_actor Union[int, float, str]

Number of GPUs to be allocated by each remote actor. The default behavior is to NOT allocate any GPU at all (which is the default behavior of the ray library as well). When given as a number n, each actor will be given n GPUs (where n can be an integer, or can be a float for fractional allocation). When given as a string "max", then the available GPUs across the entire ray cluster (or within the local computer in the simplest cases) will be equally distributed among the actors. When given as a string "all", then each actor will have access to all the GPUs (this will be achieved by suppressing the environment variable CUDA_VISIBLE_DEVICES for each actor). When the problem is not distributed (i.e. when there are no actors), this argument is expected to be left as None.

None
num_subbatches Optional[int]

If num_subbatches is None (assuming that subbatch_size is also None), then, when evaluating a population, the population will be split into n pieces, n being the number of actors, and each actor will evaluate its assigned piece. If num_subbatches is an integer m, then the population will be split into m pieces, and actors will continually accept the next unevaluated piece as they finish their current tasks. The arguments num_subbatches and subbatch_size cannot be given values other than None at the same time. While using a distributed algorithm, this argument determines how many sub-batches will be generated, and therefore, how many gradients will be computed by the remote actors.

None
subbatch_size Optional[int]

If subbatch_size is None (assuming that num_subbatches is also None), then, when evaluating a population, the population will be split into n pieces, n being the number of actors, and each actor will evaluate its assigned piece. If subbatch_size is an integer m, then the population will be split into pieces of size m, and actors will continually accept the next unevaluated piece as they finish their current tasks. When there can be significant difference across the solutions in terms of computational requirements, specifying a subbatch_size can be beneficial, because, while one actor is busy with a subbatch containing computationally challenging solutions, other actors can accept more tasks and save time. The arguments num_subbatches and subbatch_size cannot be given values other than None at the same time. While using a distributed algorithm, this argument determines the size of a sub-batch (or sub-population) sampled by a remote actor for computing a gradient. In distributed mode, it is expected that the population size is divisible by subbatch_size.

None
device Union[str, torch.device]

Default device in which a new population will be generated and the neural networks will operate. If not specified, "cpu" will be used.

None
Source code in evotorch/neuroevolution/supervisedne.py
def __init__(
    self,
    dataset: Dataset,
    network: Union[str, nn.Module, Callable[[], nn.Module]],
    loss_func: Optional[Callable] = None,
    *,
    network_args: Optional[dict] = None,
    initial_bounds: Optional[BoundsPairLike] = (-0.00001, 0.00001),
    minibatch_size: Optional[int] = None,
    num_minibatches: Optional[int] = None,
    num_actors: Optional[Union[int, str]] = None,
    common_minibatch: bool = True,
    num_gpus_per_actor: Optional[Union[int, float, str]] = None,
    actor_config: Optional[dict] = None,
    num_subbatches: Optional[int] = None,
    subbatch_size: Optional[int] = None,
    device: Optional[Device] = None,
):
    """
    `__init__(...)`: Initialize the SupervisedNE.

    Args:
        dataset: The Dataset from which the minibatches will be pulled
        network: A network structure string, or a Callable (which can be
            a class inheriting from `torch.nn.Module`, or a function
            which returns a `torch.nn.Module` instance), or an instance
            of `torch.nn.Module`.
            The object provided here determines the structure of the
            neural network whose parameters will be evolved.
            A network structure string is a string which can be processed
            by `evotorch.neuroevolution.net.str_to_net(...)`.
            Please see the documentation of the function
            `evotorch.neuroevolution.net.str_to_net(...)` to see how such
            a neural network structure string looks like.
        loss_func: Optionally a function (or a Callable object) which
            receives `y_hat` (the output generated by the neural network)
            and `y` (the desired output), and returns the loss as a
            scalar.
            This argument can also be left as None, in which case it will
            be expected that the method `_loss(self, y_hat, y)` is
            overridden by the inheriting class.
        network_args: Optionally a dict-like object, storing keyword
            arguments to be passed to the network while instantiating it.
        initial_bounds: Specifies an interval from which the values of the
            initial neural network parameters will be drawn.
        minibatch_size: Optionally an integer, describing the size of a
            minibatch when pulling data from the dataset.
            Can also be left as None, in which case it will be expected
            that the inheriting class overrides the method
            `_make_dataloader()` and defines how a new DataLoader is to be
            made.
        num_minibatches: An integer, specifying over how many minibatches
            will a single neural network be evaluated.
            If not specified, it will be assumed that the desired number
            of minibatches per network evaluation is 1.
        num_actors: Number of actors to create for parallelized
            evaluation of the solutions.
            Certain string values are also accepted.
            When given as "max" or as "num_cpus", the number of actors
            will be equal to the number of all available CPUs in the ray
            cluster.
            When given as "num_gpus", the number of actors will be
            equal to the number of all available GPUs in the ray
            cluster, and each actor will be assigned a GPU.
            When given as "num_devices", the number of actors will be
            equal to the minimum among the number of CPUs and the number
            of GPUs available in the cluster (or will be equal to the
            number of CPUs if there is no GPU), and each actor will be
            assigned a GPU (if available).
            If `num_actors` is given as "num_gpus" or "num_devices",
            the argument `num_gpus_per_actor` must not be used,
            and the `actor_config` dictionary must not contain the
            key "num_gpus".
            If `num_actors` is given as something other than "num_gpus"
            or "num_devices", and if you wish to assign GPUs to each
            actor, then please see the argument `num_gpus_per_actor`.
        common_minibatch: Whether the same minibatches will be
            used when evaluating the solutions or not.
        actor_config: A dictionary, representing the keyword arguments
            to be passed to the options(...) used when creating the
            ray actor objects. To be used for explicitly allocating
            resources per each actor.
            For example, for declaring that each actor is to use a GPU,
            one can pass `actor_config=dict(num_gpus=1)`.
            Can also be given as None (which is the default),
            if no such options are to be passed.
        num_gpus_per_actor: Number of GPUs to be allocated by each
            remote actor.
            The default behavior is to NOT allocate any GPU at all
            (which is the default behavior of the ray library as well).
            When given as a number `n`, each actor will be given
            `n` GPUs (where `n` can be an integer, or can be a `float`
            for fractional allocation).
            When given as a string "max", then the available GPUs
            across the entire ray cluster (or within the local computer
            in the simplest cases) will be equally distributed among
            the actors.
            When given as a string "all", then each actor will have
            access to all the GPUs (this will be achieved by suppressing
            the environment variable `CUDA_VISIBLE_DEVICES` for each
            actor).
            When the problem is not distributed (i.e. when there are
            no actors), this argument is expected to be left as None.
        num_subbatches: If `num_subbatches` is None (assuming that
            `subbatch_size` is also None), then, when evaluating a
            population, the population will be split into n pieces, `n`
            being the number of actors, and each actor will evaluate
            its assigned piece. If `num_subbatches` is an integer `m`,
            then the population will be split into `m` pieces,
            and actors will continually accept the next unevaluated
            piece as they finish their current tasks.
            The arguments `num_subbatches` and `subbatch_size` cannot
            be given values other than None at the same time.
            While using a distributed algorithm, this argument determines
            how many sub-batches will be generated, and therefore,
            how many gradients will be computed by the remote actors.
        subbatch_size: If `subbatch_size` is None (assuming that
            `num_subbatches` is also None), then, when evaluating a
            population, the population will be split into `n` pieces, `n`
            being the number of actors, and each actor will evaluate its
            assigned piece. If `subbatch_size` is an integer `m`,
            then the population will be split into pieces of size `m`,
            and actors will continually accept the next unevaluated
            piece as they finish their current tasks.
            When there can be significant difference across the solutions
            in terms of computational requirements, specifying a
            `subbatch_size` can be beneficial, because, while one
            actor is busy with a subbatch containing computationally
            challenging solutions, other actors can accept more
            tasks and save time.
            The arguments `num_subbatches` and `subbatch_size` cannot
            be given values other than None at the same time.
            While using a distributed algorithm, this argument determines
            the size of a sub-batch (or sub-population) sampled by a
            remote actor for computing a gradient.
            In distributed mode, it is expected that the population size
            is divisible by `subbatch_size`.
        device: Default device in which a new population will be generated
            and the neural networks will operate.
            If not specified, "cpu" will be used.
    """
    super().__init__(
        objective_sense="min",
        network=network,
        network_args=network_args,
        initial_bounds=initial_bounds,
        num_actors=num_actors,
        num_gpus_per_actor=num_gpus_per_actor,
        actor_config=actor_config,
        num_subbatches=num_subbatches,
        subbatch_size=subbatch_size,
        device=device,
    )

    self.dataset = dataset
    self.dataloader: DataLoader = None
    self.dataloader_iterator = None

    self._loss_func = loss_func
    self._minibatch_size = None if minibatch_size is None else int(minibatch_size)
    self._num_minibatches = 1 if num_minibatches is None else int(num_minibatches)
    self._common_minibatch = common_minibatch
    self._current_minibatches: Optional[list] = None

get_minibatch(self)

Get the next minibatch from the DataLoader.

Source code in evotorch/neuroevolution/supervisedne.py
def get_minibatch(self) -> Any:
    """
    Get the next minibatch from the DataLoader.
    """
    if self.dataloader is None:
        self._prepare()

    if self.dataloader_iterator is None:
        self.dataloader_iterator = iter(self.dataloader)

    batch = None
    try:
        batch = next(self.dataloader_iterator)
    except StopIteration:
        pass

    if batch is None:
        self.dataloader_iterator = iter(self.dataloader)
        batch = next(self.dataloader_iterator)

    # Move batch to device of network
    return [var.to(self.network_device) for var in batch]

loss(self, y_hat, y)

Run the loss function and return the loss.

If the __init__ of SupervisedNE class was given a loss function via the argument loss_func, then that loss function will be used. Otherwise, it will be expected that the method _loss(...) is overriden with a loss definition, and that method will be used to compute the loss. The computed loss will be returned.

Parameters:

Name Type Description Default
y_hat Any

The output estimated by the network

required
y Any

The desired output

required

Returns:

Type Description
Union[float, torch.Tensor]

A scalar, representing the loss

Source code in evotorch/neuroevolution/supervisedne.py
def loss(self, y_hat: Any, y: Any) -> Union[float, torch.Tensor]:
    """
    Run the loss function and return the loss.

    If the `__init__` of `SupervisedNE` class was given a loss
    function via the argument `loss_func`, then that loss function
    will be used. Otherwise, it will be expected that the method
    `_loss(...)` is overriden with a loss definition, and that method
    will be used to compute the loss.
    The computed loss will be returned.

    Args:
        y_hat: The output estimated by the network
        y: The desired output
    Returns:
        A scalar, representing the loss
    """
    if self._loss_func is None:
        return self._loss(y_hat, y)
    else:
        return self._loss_func(y_hat, y)

make_dataloader(self)

Make a new DataLoader.

If the __init__ of SupervisedNE was provided with a minibatch size via the argument minibatch_size, then a new DataLoader will be made with that minibatch size. Otherwise, it will be expected that the method _make_dataloader(...) was overridden to contain details regarding how the DataLoader should be created, and that method will be executed.

Returns:

Type Description
DataLoader

The created DataLoader.

Source code in evotorch/neuroevolution/supervisedne.py
def make_dataloader(self) -> DataLoader:
    """
    Make a new DataLoader.

    If the `__init__` of `SupervisedNE` was provided with a minibatch size
    via the argument `minibatch_size`, then a new DataLoader will be made
    with that minibatch size.
    Otherwise, it will be expected that the method `_make_dataloader(...)`
    was overridden to contain details regarding how the DataLoader should be
    created, and that method will be executed.

    Returns:
        The created DataLoader.
    """
    if self._minibatch_size is None:
        return self._make_dataloader()
    else:
        return DataLoader(self.dataset, shuffle=True, batch_size=self._minibatch_size)

vecgymne

VecGymNE (BaseNEProblem)

An EvoTorch problem for solving vectorized gym environments

Source code in evotorch/neuroevolution/vecgymne.py
class VecGymNE(BaseNEProblem):
    """
    An EvoTorch problem for solving vectorized gym environments
    """

    def __init__(
        self,
        env: Union[str, Callable],
        network: Union[str, Callable, nn.Module],
        *,
        env_config: Optional[Mapping] = None,
        max_num_envs: Optional[int] = None,
        network_args: Optional[Mapping] = None,
        observation_normalization: bool = False,
        decrease_rewards_by: Optional[float] = None,
        alive_bonus_schedule: Optional[tuple] = None,
        action_noise_stdev: Optional[float] = None,
        num_episodes: int = 1,
        device: Optional[Device] = None,
        num_actors: Optional[Union[int, str]] = None,
        num_gpus_per_actor: Optional[int] = None,
        num_subbatches: Optional[int] = None,
        subbatch_size: Optional[int] = None,
        actor_config: Optional[Mapping] = None,
    ):
        """
        Initialize the VecGymNE.

        Args:
            env: Environment to be solved.
                If this is given as a string starting with "gym::" (e.g.
                "gym::Humanoid-v4", etc.), then it is assumed that the target
                environment is a classical gym environment.
                If this is given as a string starting with "brax::" (e.g.
                "brax::humanoid", etc.), then it is assumed that the target
                environment is a brax environment.
                If this is given as a string which does not contain "::" at
                all (e.g. "Humanoid-v4", etc.), then it is assumed that the
                target environment is a classical gym environment. Therefore,
                "gym::Humanoid-v4" and "Humanoid-v4" are equivalent.
                If this argument is given as a Callable (maybe a function or a
                class), then, with the assumption that this Callable expects
                a keyword argument `num_envs: int`, this Callable is called
                and its result (expected as a `gym.vector.VectorEnv` instance)
                is used as the environment.
            network: A network structure string, or a Callable (which can be
                a class inheriting from `torch.nn.Module`, or a function
                which returns a `torch.nn.Module` instance), or an instance
                of `torch.nn.Module`.
                The object provided here determines the structure of the
                neural network whose parameters will be evolved.
                A network structure string is a string which can be processed
                by `evotorch.neuroevolution.net.str_to_net(...)`.
                Please see the documentation of the function
                `evotorch.neuroevolution.net.str_to_net(...)` to see how such
                a neural network structure string looks like.
                Note that this network can be a recurrent network.
                When the network's `forward(...)` method can optionally accept
                an additional positional argument for the hidden state of the
                network and returns an additional value for its next state,
                then the policy is treated as a recurrent one.
                When the network is given as a callable object (e.g.
                a subclass of `nn.Module` or a function) and this callable
                object is decorated via `evotorch.decorators.pass_info`,
                the following keyword arguments will be passed:
                (i) `obs_length` (the length of the observation vector),
                (ii) `act_length` (the length of the action vector),
                (iii) `obs_shape` (the shape tuple of the observation space),
                (iv) `act_shape` (the shape tuple of the action space),
                (v) `obs_space` (the Box object specifying the observation
                space, and
                (vi) `act_space` (the Box object specifying the action
                space). Note that `act_space` will always be given as a
                `gym.spaces.Box` instance, even when the actual gym
                environment has a discrete action space. This because
                `VecGymNE` always expects the neural network to return
                a tensor of floating-point numbers.
            env_config: Keyword arguments to pass to the environment while
                it is being created.
            max_num_envs: Maximum number of environments to be instantiated.
                By default, this is None, which means that the number of
                environments can go up to the population size (or up to the
                number of solutions that a remote actor receives, if the
                problem object is configured to have parallelization).
                For situations where the current reinforcement learning task
                requires large amount of resources (e.g. memory), allocating
                environments as much as the number of solutions might not
                be feasible. In such cases, one can set `max_num_envs` as an
                integer to bring an upper bound (in total, across all the
                remote actors, for when the problem is parallelized) to how
                many environments can be allocated.
            network_args: Any additional keyword argument to be used when
                instantiating the network can be specified via `network_args`
                as a dictionary. If there are no such additional keyword
                arguments, then `network_args` can be left as None.
                Note that the argument `network_args` is expected to be None
                when the network is specified as a `torch.nn.Module` instance.
            observation_normalization: Whether or not online normalization
                will be done on the encountered observations.
            decrease_rewards_by: If given as a float, each reward will be
                decreased by this amount. For example, if the environment's
                reward function has a constant "alive bonus" (i.e. a bonus
                that is constantly added onto the reward as long as the
                agent is alive), and if you wish to negate this bonus,
                you can set `decrease_rewards_by` to this bonus amount,
                and the bonus will be nullified.
                If you do not wish to affect the rewards in this manner,
                keep this as None.
            alive_bonus_schedule: Use this to add a customized amount of
                alive bonus.
                If left as None (which is the default), additional alive
                bonus will not be added.
                If given as a tuple `(t, b)`, an alive bonus `b` will be
                added onto all the rewards beyond the timestep `t`.
                If given as a tuple `(t0, t1, b)`, a partial (linearly
                increasing towards `b`) alive bonus will be added onto
                all the rewards between the timesteps `t0` and `t1`,
                and a full alive bonus (which equals to `b`) will be added
                onto all the rewards beyond the timestep `t1`.
            action_noise_stdev: If given as a real number `s`, then, for
                each generated action, Gaussian noise with standard
                deviation `s` will be sampled, and then this sampled noise
                will be added onto the action.
                If action noise is not desired, then this argument can be
                left as None.
                For sampling the noise, the global random number generator
                of PyTorch on the simulator's device will be used.
            num_episodes: Number of episodes over which each policy will
                be evaluated. The default is 1.
            device: The device in which the population will be kept.
                If you wish to do a single-GPU evolution, we recommend
                to set this as "cuda" (or "cuda:0", or "cuda:1", etc.),
                assuming that the simulator will also instantiate itself
                on that same device.
                Alternatively, if you wish to do a multi-GPU evolution,
                we recommend to leave this as None or set this as "cpu",
                so that the main population will be kept on the cpu
                and the remote actors will perform their evaluations on
                the GPUs that are assigned to them.
            num_actors: Number of actors to create for parallelized
                evaluation of the solutions.
                Certain string values are also accepted.
                When given as "max" or as "num_cpus", the number of actors
                will be equal to the number of all available CPUs in the ray
                cluster.
                When given as "num_gpus", the number of actors will be
                equal to the number of all available GPUs in the ray
                cluster, and each actor will be assigned a GPU.
                When given as "num_devices", the number of actors will be
                equal to the minimum among the number of CPUs and the number
                of GPUs available in the cluster (or will be equal to the
                number of CPUs if there is no GPU), and each actor will be
                assigned a GPU (if available).
                If `num_actors` is given as "num_gpus" or "num_devices",
                the argument `num_gpus_per_actor` must not be used,
                and the `actor_config` dictionary must not contain the
                key "num_gpus".
                If `num_actors` is given as something other than "num_gpus"
                or "num_devices", and if you wish to assign GPUs to each
                actor, then please see the argument `num_gpus_per_actor`.
            num_gpus_per_actor: Number of GPUs to be assigned to each
                actor. This can be an integer or a float (for when you
                wish to assign fractional amounts of GPUs to actors).
                When `num_actors` has the special value "num_devices",
                the argument `num_gpus_per_actor` is expected to be left as
                None.
            num_subbatches: For when there are multiple actors, you can
                set this to an integer n if you wish the population
                to be divided exactly into n sub-batches. The actors, as they
                finish their currently assigned sub-batch of solutions,
                will pick the next un-evaluated sub-batch.
                If you specify too large numbers for this argument, then
                each sub-batch will be smaller.
                When working with vectorized simulators on GPU, having too
                many and too small sub-batches can hurt the performance.
                This argument can be left as None, in which case, assuming
                that `subbatch_size` is also None, the population will be
                split to m sub-batches, m being the number of actors.
            subbatch_size: For when there are multiple actors, you can
                set this to an integer n if you wish the population to be
                divided into sub-batches in such a way that each sub-batch
                will consist of exactly n solutions. The actors, as they
                finish their currently assigned sub-batch of solutions,
                will pick the next un-evaluated sub-batch.
                If you specify too small numbers for this argument, then
                there will be many sub-batches, each sub-batch having a
                small number of solutions.
                When working with vectorized simulators on GPU, having too
                many and too small sub-batches can hurt the performance.
                This argument can be left as None, in which case, assuming
                that `num_subbatches` is also None, the population will be
                split to m sub-batches, m being the number of actors.
            actor_config: Additional configuration to be used when creating
                each actor with the help of `ray` library.
                Can be left as None if additional configuration is not needed.
        """

        # Store the string or the Callable that will be used to generate the reinforcement learning environment.
        self._env_maker = env

        # Declare the variable which will store the environment.
        self._env: Optional[TorchWrapper] = None

        # Declare the variable which will store the batch size of the vectorized environment.
        self._num_envs: Optional[int] = None

        # Store the upper bound (if any) regarding how many environments can exist at the same time.
        self._max_num_envs: Optional[int] = None if max_num_envs is None else int(max_num_envs)

        # Actor-specific upper bound regarding how many environments can exist at the same time.
        # This variable will be filled by the `_parallelize(...)` method.
        self._actor_max_num_envs: Optional[int] = None

        # Declare the variable which stores whether or not we properly initialized the `_actor_max_num_envs` variable.
        self._actor_max_num_envs_ready: bool = False

        # Store the additional configurations to be used as keyword arguments while instantiating the environment.
        self._env_config: dict = {} if env_config is None else dict(env_config)

        # Declare the variable that will store the device of the simulator.
        # This variable will be filled when the first observation is received from the environment.
        # The device of the observation array received from the environment will determine the value of this variable.
        self._simulator_device: Optional[torch.device] = None

        # Store the neural network architecture (that might be a string or an `nn.Module` instance).
        self._architecture = network

        if network_args is None:
            # If `network_args` is given as None, change it to an empty dictionary
            network_args = {}

        if isinstance(network, str):
            # If the network is given as a string, then we will need the values for the constants `obs_length`,
            # `act_length`, and `obs_space`. To obtain those values, we use our helper function
            # `_env_constants_for_str_net(...)` which temporarily instantiates the specified environment and returns
            # its needed constants.
            env_constants = _env_constants_for_str_net(self._env_maker, **(self._env_config))
        elif isinstance(network, nn.Module):
            # If the network is an already instantiated nn.Module, then we do not prepare any pre-defined constants.
            env_constants = {}
        else:
            # If the network is given as a Callable, then we will need the values for the constants `obs_length`,
            # `act_length`, and `obs_space`. To obtain those values, we use our helper function
            # `_env_constants_for_callable_net(...)` which temporarily instantiates the specified environment and
            # returns its needed constants.
            env_constants = _env_constants_for_callable_net(self._env_maker, **(self._env_config))

        # Build a `Policy` instance according to the given architecture, and store it.
        if isinstance(network, str):
            instantiated_net = str_to_net(network, **{**env_constants, **network_args})
        elif isinstance(network, nn.Module):
            instantiated_net = network
        else:
            instantiated_net = pass_info_if_needed(network, env_constants)(**network_args)
        self._policy = Policy(instantiated_net)

        # Store the boolean which indicates whether or not there will be observation normalization.
        self._observation_normalization = bool(observation_normalization)

        # Declare the variables that will store the observation-related stats if observation normalization is enabled.
        self._obs_stats: Optional[RunningNorm] = None
        self._collected_stats: Optional[RunningNorm] = None

        # Store the number of episodes configuration given by the user.
        self._num_episodes = int(num_episodes)

        # Store the `decrease_rewards_by` configuration given by the user.
        self._decrease_rewards_by = None if decrease_rewards_by is None else float(decrease_rewards_by)

        if alive_bonus_schedule is None:
            # If `alive_bonus_schedule` argument is None, then we store it as None as well.
            self._alive_bonus_schedule = None
        else:
            # This is the case where the user has specified an `alive_bonus_schedule`.
            alive_bonus_schedule = list(alive_bonus_schedule)
            alive_bonus_schedule_length = len(alive_bonus_schedule)
            if alive_bonus_schedule_length == 2:
                # If `alive_bonus_schedule` was given as a 2-element sequence (t, b), then store it as (t, t, b).
                # This means that the partial alive bonus time window starts and ends at t, therefore, there will
                # be no alive bonus until t, and beginning with t, there will be full alive bonus.
                self._alive_bonus_schedule = [
                    int(alive_bonus_schedule[0]),
                    int(alive_bonus_schedule[0]),
                    float(alive_bonus_schedule[1]),
                ]
            elif alive_bonus_schedule_length == 3:
                # If `alive_bonus_schedule` was given as a 3-element sequence (t0, t1, b), then store those 3
                # elements.
                self._alive_bonus_schedule = [
                    int(alive_bonus_schedule[0]),
                    int(alive_bonus_schedule[1]),
                    float(alive_bonus_schedule[2]),
                ]
            else:
                # `alive_bonus_schedule` sequences with unrecognized lengths trigger an error.
                raise ValueError(
                    f"Received invalid number elements as the alive bonus schedule."
                    f" Expected 2 or 3 items, but got these: {self._alive_bonus_schedule}"
                    f" (having a length of {len(self._alive_bonus_schedule)})."
                )

        # If `action_noise_stdev` is specified, store it.
        self._action_noise_stdev = None if action_noise_stdev is None else float(action_noise_stdev)

        # Initialize the counters for the number of simulator interactions and the number of episodes.
        self._interaction_count: int = 0
        self._episode_count: int = 0

        # Call the superclass
        super().__init__(
            objective_sense="max",
            initial_bounds=(-0.00001, 0.00001),
            solution_length=self._policy.parameter_length,
            device=device,
            dtype=torch.float32,
            num_actors=num_actors,
            num_gpus_per_actor=num_gpus_per_actor,
            actor_config=actor_config,
            num_subbatches=num_subbatches,
            subbatch_size=subbatch_size,
        )

    def _parallelize(self):
        super()._parallelize()
        if self.is_main:
            if not self._actor_max_num_envs_ready:
                if self._actors is None:
                    self._actor_max_num_envs = self._max_num_envs
                else:
                    if self._max_num_envs is not None:
                        max_num_envs_per_actor = split_workload(self._max_num_envs, len(self._actors))
                        for i_actor, actor in enumerate(self._actors):
                            actor.call.remote("_set_actor_max_num_envs", max_num_envs_per_actor[i_actor])
                self._actor_max_num_envs_ready = True

    def _set_actor_max_num_envs(self, n: int):
        self._actor_max_num_envs = n
        self._actor_max_num_envs_ready = True

    @property
    def observation_normalization(self) -> bool:
        return self._observation_normalization

    def set_episode_count(self, n: int):
        """
        Set the episode count manually.
        """
        self._episode_count = int(n)

    def set_interaction_count(self, n: int):
        """
        Set the interaction count manually.
        """
        self._interaction_count = int(n)

    @property
    def interaction_count(self) -> int:
        """
        Get the total number of simulator interactions made.
        """
        return self._interaction_count

    @property
    def episode_count(self) -> int:
        """
        Get the total number of episodes completed.
        """
        return self._episode_count

    def _get_local_episode_count(self) -> int:
        return self.episode_count

    def _get_local_interaction_count(self) -> int:
        return self.interaction_count

    def _get_env(self, num_policies: int) -> TorchWrapper:
        # Get the existing environment instance stored by this VecGymNE, after (re)building it if needed.

        if (self._env is None) or (num_policies > self._num_envs):
            # If this VecGymNE does not have its environment ready yet (i.e. the `_env` attribute is None)
            # or it the batch size of the previously instantiated environment is not enough to deal with
            # the number of policies (i.e. the `_num_envs` attribute is less than `num_policies`), then
            # we (re)build the environment.

            # Keyword arguments to pass to the TorchWrapper.
            torch_wrapper_cfg = dict(
                force_classic_api=True,
                discrete_to_continuous_act=True,
                clip_actions=True,
            )

            if isinstance(self._env_maker, str):
                # If the environment is specified via a string, then we use our `make_vector_env` function.
                self._env = make_vector_env(
                    self._env_maker, num_envs=num_policies, **torch_wrapper_cfg, **(self._env_config)
                )
            else:
                # If the environment is specified via a Callable, then we call it.
                # We expect this Callable to accept a keyword argument named `num_envs`, and additionally, we pass
                # the environment configuration dictionary as keyword arguments.
                self._env = self._env_maker(num_envs=num_policies, **(self._env_config))

                if not isinstance(self._env, gym.vector.VectorEnv):
                    # If what is returned by the Callable is not a vectorized environment, then we trigger an error.
                    raise TypeError("This is not a vectorized environment")

                # We wrap the returned vectorized environment with a TorchWrapper, so that the actions that we send
                # and the observations and rewards that we receive are PyTorch tensors.
                self._env = TorchWrapper(self._env, **torch_wrapper_cfg)

            if self._env.num_envs != num_policies:
                # If the finally obtained vectorized environment has a different number of batch size, then we trigger
                # an error.
                raise ValueError("Incompatible number of environments")

            # We update the batch size of the created environment.
            self._num_envs = num_policies

            if not isinstance(self._env.single_observation_space, Box):
                # If the observation space is not Box, then we trigger an error.
                raise TypeError(
                    f"Unsupported observation type: {self._env.single_observation_space}."
                    f" Only Box-typed observation spaces are supported."
                )

            try:
                # If possible, use the `seed(...)` method to explicitly randomize the environment.
                # Although the new gym API removed the seed method, some environments define their own `seed(...)`
                # method for randomization.
                new_seed = random.randint(0, (2**32) - 1)
                self._env.seed(new_seed)
            except Exception:
                # Our attempt at manually seeding the environment has failed.
                # This could be because the environment does not have a `seed(...)` method.
                # Nothing to do.
                pass

        return self._env

    @property
    def _nonserialized_attribs(self):
        # Call the `_nonserialized_attribs` property implementation of the superclass to receive the base list
        # of non-serialized attributes, then add "_env" to this base list, and then return the resulting list.
        return super()._nonserialized_attribs + ["_env"]

    @property
    def _grad_device(self) -> torch.device:
        # For distributed mode, this property determines the device in which the temporary populations will be made
        # for gradient computation.

        if self._simulator_device is None:
            # If the simulator device is not known yet, then we return the cpu device.
            return torch.device("cpu")
        else:
            # If the simulator device is known, then we return that device.
            return self._simulator_device

    def _make_running_norm(self, observation: torch.Tensor) -> RunningNorm:
        # Make a new RunningNorm instance according to the observation tensor.
        # The dtype and the device of the new RunningNorm is taken from the observation.
        # This new RunningNorm is empty (i.e. does not contain any stats yet).
        return RunningNorm(shape=observation.shape[1:], dtype=observation.dtype, device=observation.device)

    def _transfer_running_norm(self, rn: RunningNorm, observation: torch.Tensor) -> RunningNorm:
        # Transfer (if necessary) the RunningNorm to the device of the observation tensor.
        # The returned RunningNorm may be the RunningNorm itself (if the device did not change)
        # or a new copy (if the device did change).
        if torch.device(rn.device) != torch.device(observation.device):
            rn = rn.to(observation.device)
        return rn

    def _normalize_observation(
        self, observation: torch.Tensor, *, mask: Optional[torch.Tensor] = None, update_stats: bool = True
    ) -> torch.Tensor:
        # This function normalizes the received observation batch.
        # If a mask is given (as a tensor of booleans), only observations with corresponding mask value set as True
        # will be taken into consideration.
        # If `update_stats` is given as True and observation normalization is enabled, then we will update the
        # RunningNorm instances as well.

        if self._observation_normalization:
            # This is the case where observation normalization is enabled.
            if self._obs_stats is None:
                # If we do not have observation stats yet, we build a new one (according to the dtype and device
                # of the observation).
                self._obs_stats = self._make_running_norm(observation)
            else:
                # If we already have observation stats, we make sure that it is in the correct device.
                self._obs_stats = self._transfer_running_norm(self._obs_stats, observation)

            if update_stats:
                # This is the case where the `update_stats` argument was encountered as True.
                if self._collected_stats is None:
                    # If the RunningNorm responsible to collect new stats is not built yet, we build it here
                    # (according to the dtype and device of the observation).
                    self._collected_stats = self._make_running_norm(observation)
                else:
                    # If the RunningNorm responsible to collect new stats already exists, then we make sure
                    # that it is in the correct device.
                    self._collected_stats = self._transfer_running_norm(self._collected_stats, observation)

                # We first update the RunningNorm responsible for collecting the new stats.
                self._collected_stats.update(observation, mask)

                # We now update the RunningNorm which stores all the stats, and return the normalized observation.
                result = self._obs_stats.update_and_normalize(observation, mask)
            else:
                # This is the case where the `update_stats` argument was encountered as False.
                # Here we normalize the observation but do not update our existing RunningNorm instances.
                result = self._obs_stats.update(observation, mask)
            return result
        else:
            # This is the case where observation normalization is disabled.
            # In this case, we just return the observation as it is.
            return observation

    def _ensure_obsnorm(self):
        if not self.observation_normalization:
            raise ValueError("This feature can only be used when observation_normalization=True.")

    def get_observation_stats(self) -> RunningNorm:
        """Get the observation stats"""
        self._ensure_obsnorm()
        return self._obs_stats

    def _make_sync_data_for_actors(self) -> Any:
        if self.observation_normalization:
            obs_stats = self.get_observation_stats()
            if obs_stats is not None:
                obs_stats = obs_stats.to("cpu")
            return dict(obs_stats=obs_stats)
        else:
            return None

    def set_observation_stats(self, rn: RunningNorm):
        """Set the observation stats"""
        self._ensure_obsnorm()
        self._obs_stats = rn

    def _use_sync_data_from_main(self, received: dict):
        for k, v in received.items():
            if k == "obs_stats":
                self.set_observation_stats(v)

    def pop_observation_stats(self) -> RunningNorm:
        """Get and clear the collected observation stats"""
        self._ensure_obsnorm()
        result = self._collected_stats
        self._collected_stats = None
        return result

    def _make_sync_data_for_main(self) -> Any:
        result = dict(episode_count=self.episode_count, interaction_count=self.interaction_count)

        if self.observation_normalization:
            collected = self.pop_observation_stats()
            if collected is not None:
                collected = collected.to("cpu")
            result["obs_stats_delta"] = collected

        return result

    def update_observation_stats(self, rn: RunningNorm):
        """Update the observation stats via another RunningNorm instance"""
        self._ensure_obsnorm()
        if self._obs_stats is None:
            self._obs_stats = rn
        else:
            self._obs_stats.update(rn)

    def _use_sync_data_from_actors(self, received: list):
        total_episode_count = 0
        total_interaction_count = 0

        for data in received:
            data: dict
            total_episode_count += data["episode_count"]
            total_interaction_count += data["interaction_count"]
            if self.observation_normalization:
                self.update_observation_stats(data["obs_stats_delta"])

        self.set_episode_count(total_episode_count)
        self.set_interaction_count(total_interaction_count)

    def _make_pickle_data_for_main(self) -> dict:
        # For when the main Problem object (the non-remote one) gets pickled,
        # this function returns the counters of this remote Problem instance,
        # to be sent to the main one.
        return dict(interaction_count=self.interaction_count, episode_count=self.episode_count)

    def _use_pickle_data_from_main(self, state: dict):
        # For when a newly unpickled Problem object gets (re)parallelized,
        # this function restores the inner states specific to this remote
        # worker. In the case of GymNE, those inner states are episode
        # and interaction counters.
        for k, v in state.items():
            if k == "episode_count":
                self.set_episode_count(v)
            elif k == "interaction_count":
                self.set_interaction_count(v)
            else:
                raise ValueError(f"When restoring the inner state of a remote worker, unrecognized state key: {k}")

    def _evaluate_batch(self, batch: SolutionBatch):
        if self._actor_max_num_envs is None:
            self._evaluate_subbatch(batch)
        else:
            subbatches = batch.split(max_size=self._actor_max_num_envs)
            for subbatch in subbatches:
                self._evaluate_subbatch(subbatch)

    def _evaluate_subbatch(self, batch: SolutionBatch):
        # Get the number of solutions and the solution batch from the shape of the batch.
        num_solutions, solution_length = batch.values_shape

        # Get (possibly after (re)building) the environment object.
        env = self._get_env(num_solutions)

        # Reset the environment and receive the first observation batch.
        obs_per_env = env.reset()

        # Update the simulator device according to the device of the observation batch received.
        self._simulator_device = obs_per_env.device

        # Get the number of environments.
        num_envs = obs_per_env.shape[0]

        # Transfer (if necessary) the solutions (which are the network parameters) to the simulator device.
        batch_values = batch.values.to(self._simulator_device)

        if num_solutions == num_envs:
            # If the number of solutions is equal to the number of environments, then we declare all of the solutions
            # as the network parameters, and we declare all of these environments active.
            params_per_env = batch_values
            active_per_env = torch.ones(num_solutions, dtype=torch.bool, device=self._simulator_device)
        elif num_solutions < num_envs:
            # If the number of solutions is less than the number of environments, then we allocate a new empty
            # tensor to represent the network parameters.
            params_per_env = torch.empty((num_envs, solution_length), dtype=batch.dtype, device=self._simulator_device)

            # The first `num_solutions` rows of this new parameters tensor is filled with the values of the solutions.
            params_per_env[:num_solutions, :] = batch_values

            # The remaining parameters become the clones of the first solution.
            params_per_env[num_solutions:, :] = batch_values[0]

            # At first, all the environments are declared as inactive.
            active_per_env = torch.zeros(num_envs, dtype=torch.bool, device=self._simulator_device)

            # Now, the first `num_solutions` amount of environments is declared as active.
            # The remaining ones remain inactive.
            active_per_env[:num_solutions] = True
        else:
            assert False, "Received incompatible number of environments"

        # We get the policy and fill it with the parameters stored by the solutions.
        policy = self._policy
        policy.set_parameters(params_per_env)

        # Declare the counter which stores the total timesteps encountered during this evaluation.
        total_timesteps = 0

        # Declare the counters (one for each environment) storing the number of episodes completed.
        num_eps_per_env = torch.zeros(num_envs, dtype=torch.int64, device=self._simulator_device)

        # Declare the scores (one for each environment).
        score_per_env = torch.zeros(num_envs, dtype=torch.float32, device=self._simulator_device)

        if self._alive_bonus_schedule is not None:
            # If an alive_bonus_schedule was provided, then we extract the timesteps.
            # bonus_t0 is the timestep where the partial alive bonus will start.
            # bonus_t1 is the timestep where the full alive bonus will start.
            # alive_bonus is the amount that will be added to reward if the agent is alive.
            bonus_t0, bonus_t1, alive_bonus = self._alive_bonus_schedule

            if bonus_t1 > bonus_t0:
                # If bonus_t1 is bigger than bonus_t0, then we have a partial alive bonus time window.
                add_partial_alive_bonus = True

                # We compute and store the length of the time window.
                bonus_t_gap_as_float = float(bonus_t1 - bonus_t0)
            else:
                # If bonus_t1 is NOT bigger than bonus_t0, then we do NOT have a partial alive bonus time window.
                add_partial_alive_bonus = False

            # To properly give the alive bonus for each solution, we need to keep track of the timesteps for all
            # the running solutions. So, we declare the following variable.
            t_per_env = torch.zeros(num_envs, dtype=torch.int64, device=self._simulator_device)

        # We normalize the initial observation.
        obs_per_env = self._normalize_observation(obs_per_env, mask=active_per_env)

        while True:
            # Pass the observations through the policy and get the actions to perform.
            action_per_env = policy(torch.as_tensor(obs_per_env, dtype=params_per_env.dtype))

            if self._action_noise_stdev is not None:
                # If we are to apply action noise, we sample from a Gaussian distribution and add the noise onto
                # the actions.
                action_per_env = action_per_env + (torch.rand_like(action_per_env) * self._action_noise_stdev)

            # Apply the actions, get the observations, rewards, and the 'done' flags.
            obs_per_env, reward_per_env, done_per_env, _ = env.step(action_per_env)

            if self._decrease_rewards_by is not None:
                # We decrease the rewards, if we have the configuration to do so.
                reward_per_env = reward_per_env - self._decrease_rewards_by

            if self._alive_bonus_schedule is not None:
                # Here we handle the alive bonus schedule.

                # For each environment, increment the timestep.
                t_per_env[active_per_env] += 1

                # For those who are within the full alive bonus time region, increase the scores by the full amount.
                in_full_bonus_t_per_env = active_per_env & (t_per_env >= bonus_t1)
                score_per_env[in_full_bonus_t_per_env] += alive_bonus

                if add_partial_alive_bonus:
                    # Here we handle the partial alive bonus time window.
                    # We first determine which environments are in the partial alive bonus time window.
                    in_partial_bonus_t_per_env = active_per_env & (t_per_env >= bonus_t0) & (t_per_env < bonus_t1)

                    # Here we compute the partial alive bonuses and add those bonuses to the scores.
                    score_per_env[in_partial_bonus_t_per_env] += alive_bonus * (
                        torch.as_tensor(t_per_env[in_partial_bonus_t_per_env] - bonus_t0, dtype=torch.float32)
                        / bonus_t_gap_as_float
                    )

                # Determine which environments just finished their episodes.
                just_finished_per_env = active_per_env & done_per_env

                # Reset the timestep counters of the environments which are just finished.
                t_per_env[just_finished_per_env] = 0

            # For each active environment, increase the score by the reward received.
            score_per_env[active_per_env] += reward_per_env[active_per_env]

            # Update the total timesteps counter.
            total_timesteps += int(torch.sum(active_per_env))

            # Reset the policies whose episodes are done (so that their hidden states become 0).
            policy.reset(done_per_env)

            # Update the number of episodes counter for each environment.
            num_eps_per_env[done_per_env] += 1

            # Solutions with number of completed episodes larger than the number of allowed episodes become inactive.
            active_per_env[:num_solutions] = num_eps_per_env[:num_solutions] < self._num_episodes

            if not torch.any(active_per_env[:num_solutions]):
                # If there is not a single active solution left, then we exit this loop.
                break

            # For the next iteration of this loop, we normalize the observation.
            obs_per_env = self._normalize_observation(obs_per_env, mask=active_per_env)

        # Update the interaction count and the episode count stored by this VecGymNE instance.
        self._interaction_count += total_timesteps
        self._episode_count += num_solutions * self._num_episodes

        # Compute the fitnesses
        fitnesses = score_per_env[:num_solutions]
        if self._num_episodes > 1:
            fitnesses /= self._num_episodes

        # Assign the scores to the solutions as fitnesses.
        batch.set_evals(fitnesses)

    def get_env(self) -> Optional[gym.Env]:
        """
        Get the gym environment.

        Returns:
            The gym environment if it is built. If not built yet, None.
        """
        return self._env

    def to_policy(self, solution: Iterable, *, with_wrapper_modules: bool = True) -> nn.Module:
        """
        Convert the given solution to a policy.

        Args:
            solution: A solution which can be given as a `torch.Tensor`, as a
                `Solution`, or as any `Iterable`.
            with_wrapper_modules: Whether or not to wrap the policy module
                with helper modules so that observations are normalized
                and actions are clipped to be within the correct boundaries.
                The default and the recommended value is True.
        Returns:
            The policy, as a `torch.nn.Module` instance.
        """
        # Get the gym environment
        env = self._get_env(1)

        # Get the observation space, its lower and higher bounds.
        obs_space = env.single_action_space
        low = obs_space.low
        high = obs_space.high

        # If the lower and higher bounds are not -inf and +inf respectively, then this environment needs clipping.
        needs_clipping = _numpy_arrays_specify_bounds(low, high)

        # Convert the solution to a PyTorch tensor on cpu.
        if isinstance(solution, torch.Tensor):
            solution = solution.to("cpu")
        elif isinstance(solution, Solution):
            solution = solution.values.clone().to("cpu")
        else:
            solution = torch.as_tensor(solution, dtype=torch.float32, device="cpu")

        # Convert the internally stored policy to a PyTorch module.
        result = self._policy.to_torch_module(solution)

        if with_wrapper_modules:
            if self.observation_normalization and (self._obs_stats is not None):
                # If observation normalization is needed and there are collected observation stats, then we wrap the
                # policy with an ObsNormWrapperModule.
                result = ObsNormWrapperModule(result, self._obs_stats)

            if needs_clipping:
                # If clipping is needed, then we wrap the policy with an ActClipWrapperModule
                result = ActClipWrapperModule(result, obs_space)

        return result

    def save_solution(self, solution: Iterable, fname: Union[str, Path]):
        """
        Save the solution into a pickle file.
        Among the saved data within the pickle file are the solution
        (as a PyTorch tensor), the policy (as a `torch.nn.Module` instance),
        and observation stats (if any).

        Args:
            solution: The solution to be saved. This can be a PyTorch tensor,
                a `Solution` instance, or any `Iterable`.
            fname: The file name of the pickle file to be created.
        """

        # Convert the solution to a PyTorch tensor on the cpu.
        if isinstance(solution, torch.Tensor):
            solution = solution.to("cpu")
        elif isinstance(solution, Solution):
            solution = solution.values.clone().to("cpu")
        else:
            solution = torch.as_tensor(solution, dtype=torch.float32, device="cpu")

        if isinstance(solution, ReadOnlyTensor):
            solution = solution.as_subclass(torch.Tensor)

        # Store the solution and the policy.
        result = {
            "solution": solution,
            "policy": self.to_policy(solution),
        }

        # If available, store the observation stats.
        if self.observation_normalization and (self._obs_stats is not None):
            result["obs_mean"] = self._obs_stats.mean.to("cpu")
            result["obs_stdev"] = self._obs_stats.stdev.to("cpu")
            result["obs_sum"] = self._obs_stats.sum.to("cpu")
            result["obs_sum_of_squares"] = self._obs_stats.sum_of_squares.to("cpu")

        # Some additional data.
        result["interaction_count"] = self.interaction_count
        result["episode_count"] = self.episode_count
        result["time"] = datetime.now()

        if isinstance(self._env_maker, str):
            # If the environment was specified via a string, store the string.
            result["env"] = self._env_maker

        # Store the network architecture.
        result["architecture"] = self._architecture

        # Save the dictionary which stores the data.
        with open(fname, "wb") as f:
            pickle.dump(result, f)

    @property
    def max_num_envs(self) -> Optional[int]:
        """
        Maximum number of environments to be allocated.

        If a maximum number of environments is not set, then None is returned.
        If this problem instance is the main one, then the overall maximum
        number of environments is returned.
        If this problem instance is a remote one (i.e. is on a remote actor)
        then the maximum number of environments for that actor is returned.
        """
        if self.is_main:
            return self._max_num_envs
        else:
            return self._actor_max_num_envs

    def make_net(self, solution: Iterable) -> nn.Module:
        """
        Make a new policy network parameterized by the given solution.
        Note that this parameterized network assumes that the observation
        is already normalized, and it does not do action clipping to ensure
        that the generated actions are within valid bounds.

        To have a policy network which has its own observation normalization
        and action clipping layers, please see the method `to_policy(...)`.

        Args:
            solution: The solution which stores the parameters.
                This can be a Solution instance, or a 1-dimensional tensor,
                or any Iterable of real numbers.
        Returns:
            The policy network, as a PyTorch module.
        """
        return self.to_policy(solution, with_wrapper_modules=False)

    @property
    def network_device(self) -> Optional[Device]:
        """
        The device on which the policy networks will operate.

        Specific to VecGymNE, the network device is determined only
        after receiving the first observation from the reinforcement
        learning environment. Until then, this property has the value
        None.
        """
        return self._simulator_device

episode_count: int property readonly

Get the total number of episodes completed.

interaction_count: int property readonly

Get the total number of simulator interactions made.

max_num_envs: Optional[int] property readonly

Maximum number of environments to be allocated.

If a maximum number of environments is not set, then None is returned. If this problem instance is the main one, then the overall maximum number of environments is returned. If this problem instance is a remote one (i.e. is on a remote actor) then the maximum number of environments for that actor is returned.

network_device: Union[str, torch.device] property readonly

The device on which the policy networks will operate.

Specific to VecGymNE, the network device is determined only after receiving the first observation from the reinforcement learning environment. Until then, this property has the value None.

__init__(self, env, network, *, env_config=None, max_num_envs=None, network_args=None, observation_normalization=False, decrease_rewards_by=None, alive_bonus_schedule=None, action_noise_stdev=None, num_episodes=1, device=None, num_actors=None, num_gpus_per_actor=None, num_subbatches=None, subbatch_size=None, actor_config=None) special

Initialize the VecGymNE.

Parameters:

Name Type Description Default
env Union[str, Callable]

Environment to be solved. If this is given as a string starting with "gym::" (e.g. "gym::Humanoid-v4", etc.), then it is assumed that the target environment is a classical gym environment. If this is given as a string starting with "brax::" (e.g. "brax::humanoid", etc.), then it is assumed that the target environment is a brax environment. If this is given as a string which does not contain "::" at all (e.g. "Humanoid-v4", etc.), then it is assumed that the target environment is a classical gym environment. Therefore, "gym::Humanoid-v4" and "Humanoid-v4" are equivalent. If this argument is given as a Callable (maybe a function or a class), then, with the assumption that this Callable expects a keyword argument num_envs: int, this Callable is called and its result (expected as a gym.vector.VectorEnv instance) is used as the environment.

required
network Union[str, Callable, torch.nn.modules.module.Module]

A network structure string, or a Callable (which can be a class inheriting from torch.nn.Module, or a function which returns a torch.nn.Module instance), or an instance of torch.nn.Module. The object provided here determines the structure of the neural network whose parameters will be evolved. A network structure string is a string which can be processed by evotorch.neuroevolution.net.str_to_net(...). Please see the documentation of the function evotorch.neuroevolution.net.str_to_net(...) to see how such a neural network structure string looks like. Note that this network can be a recurrent network. When the network's forward(...) method can optionally accept an additional positional argument for the hidden state of the network and returns an additional value for its next state, then the policy is treated as a recurrent one. When the network is given as a callable object (e.g. a subclass of nn.Module or a function) and this callable object is decorated via evotorch.decorators.pass_info, the following keyword arguments will be passed: (i) obs_length (the length of the observation vector), (ii) act_length (the length of the action vector), (iii) obs_shape (the shape tuple of the observation space), (iv) act_shape (the shape tuple of the action space), (v) obs_space (the Box object specifying the observation space, and (vi) act_space (the Box object specifying the action space). Note that act_space will always be given as a gym.spaces.Box instance, even when the actual gym environment has a discrete action space. This because VecGymNE always expects the neural network to return a tensor of floating-point numbers.

required
env_config Optional[collections.abc.Mapping]

Keyword arguments to pass to the environment while it is being created.

None
max_num_envs Optional[int]

Maximum number of environments to be instantiated. By default, this is None, which means that the number of environments can go up to the population size (or up to the number of solutions that a remote actor receives, if the problem object is configured to have parallelization). For situations where the current reinforcement learning task requires large amount of resources (e.g. memory), allocating environments as much as the number of solutions might not be feasible. In such cases, one can set max_num_envs as an integer to bring an upper bound (in total, across all the remote actors, for when the problem is parallelized) to how many environments can be allocated.

None
network_args Optional[collections.abc.Mapping]

Any additional keyword argument to be used when instantiating the network can be specified via network_args as a dictionary. If there are no such additional keyword arguments, then network_args can be left as None. Note that the argument network_args is expected to be None when the network is specified as a torch.nn.Module instance.

None
observation_normalization bool

Whether or not online normalization will be done on the encountered observations.

False
decrease_rewards_by Optional[float]

If given as a float, each reward will be decreased by this amount. For example, if the environment's reward function has a constant "alive bonus" (i.e. a bonus that is constantly added onto the reward as long as the agent is alive), and if you wish to negate this bonus, you can set decrease_rewards_by to this bonus amount, and the bonus will be nullified. If you do not wish to affect the rewards in this manner, keep this as None.

None
alive_bonus_schedule Optional[tuple]

Use this to add a customized amount of alive bonus. If left as None (which is the default), additional alive bonus will not be added. If given as a tuple (t, b), an alive bonus b will be added onto all the rewards beyond the timestep t. If given as a tuple (t0, t1, b), a partial (linearly increasing towards b) alive bonus will be added onto all the rewards between the timesteps t0 and t1, and a full alive bonus (which equals to b) will be added onto all the rewards beyond the timestep t1.

None
action_noise_stdev Optional[float]

If given as a real number s, then, for each generated action, Gaussian noise with standard deviation s will be sampled, and then this sampled noise will be added onto the action. If action noise is not desired, then this argument can be left as None. For sampling the noise, the global random number generator of PyTorch on the simulator's device will be used.

None
num_episodes int

Number of episodes over which each policy will be evaluated. The default is 1.

1
device Union[str, torch.device]

The device in which the population will be kept. If you wish to do a single-GPU evolution, we recommend to set this as "cuda" (or "cuda:0", or "cuda:1", etc.), assuming that the simulator will also instantiate itself on that same device. Alternatively, if you wish to do a multi-GPU evolution, we recommend to leave this as None or set this as "cpu", so that the main population will be kept on the cpu and the remote actors will perform their evaluations on the GPUs that are assigned to them.

None
num_actors Union[int, str]

Number of actors to create for parallelized evaluation of the solutions. Certain string values are also accepted. When given as "max" or as "num_cpus", the number of actors will be equal to the number of all available CPUs in the ray cluster. When given as "num_gpus", the number of actors will be equal to the number of all available GPUs in the ray cluster, and each actor will be assigned a GPU. When given as "num_devices", the number of actors will be equal to the minimum among the number of CPUs and the number of GPUs available in the cluster (or will be equal to the number of CPUs if there is no GPU), and each actor will be assigned a GPU (if available). If num_actors is given as "num_gpus" or "num_devices", the argument num_gpus_per_actor must not be used, and the actor_config dictionary must not contain the key "num_gpus". If num_actors is given as something other than "num_gpus" or "num_devices", and if you wish to assign GPUs to each actor, then please see the argument num_gpus_per_actor.

None
num_gpus_per_actor Optional[int]

Number of GPUs to be assigned to each actor. This can be an integer or a float (for when you wish to assign fractional amounts of GPUs to actors). When num_actors has the special value "num_devices", the argument num_gpus_per_actor is expected to be left as None.

None
num_subbatches Optional[int]

For when there are multiple actors, you can set this to an integer n if you wish the population to be divided exactly into n sub-batches. The actors, as they finish their currently assigned sub-batch of solutions, will pick the next un-evaluated sub-batch. If you specify too large numbers for this argument, then each sub-batch will be smaller. When working with vectorized simulators on GPU, having too many and too small sub-batches can hurt the performance. This argument can be left as None, in which case, assuming that subbatch_size is also None, the population will be split to m sub-batches, m being the number of actors.

None
subbatch_size Optional[int]

For when there are multiple actors, you can set this to an integer n if you wish the population to be divided into sub-batches in such a way that each sub-batch will consist of exactly n solutions. The actors, as they finish their currently assigned sub-batch of solutions, will pick the next un-evaluated sub-batch. If you specify too small numbers for this argument, then there will be many sub-batches, each sub-batch having a small number of solutions. When working with vectorized simulators on GPU, having too many and too small sub-batches can hurt the performance. This argument can be left as None, in which case, assuming that num_subbatches is also None, the population will be split to m sub-batches, m being the number of actors.

None
actor_config Optional[collections.abc.Mapping]

Additional configuration to be used when creating each actor with the help of ray library. Can be left as None if additional configuration is not needed.

None
Source code in evotorch/neuroevolution/vecgymne.py
def __init__(
    self,
    env: Union[str, Callable],
    network: Union[str, Callable, nn.Module],
    *,
    env_config: Optional[Mapping] = None,
    max_num_envs: Optional[int] = None,
    network_args: Optional[Mapping] = None,
    observation_normalization: bool = False,
    decrease_rewards_by: Optional[float] = None,
    alive_bonus_schedule: Optional[tuple] = None,
    action_noise_stdev: Optional[float] = None,
    num_episodes: int = 1,
    device: Optional[Device] = None,
    num_actors: Optional[Union[int, str]] = None,
    num_gpus_per_actor: Optional[int] = None,
    num_subbatches: Optional[int] = None,
    subbatch_size: Optional[int] = None,
    actor_config: Optional[Mapping] = None,
):
    """
    Initialize the VecGymNE.

    Args:
        env: Environment to be solved.
            If this is given as a string starting with "gym::" (e.g.
            "gym::Humanoid-v4", etc.), then it is assumed that the target
            environment is a classical gym environment.
            If this is given as a string starting with "brax::" (e.g.
            "brax::humanoid", etc.), then it is assumed that the target
            environment is a brax environment.
            If this is given as a string which does not contain "::" at
            all (e.g. "Humanoid-v4", etc.), then it is assumed that the
            target environment is a classical gym environment. Therefore,
            "gym::Humanoid-v4" and "Humanoid-v4" are equivalent.
            If this argument is given as a Callable (maybe a function or a
            class), then, with the assumption that this Callable expects
            a keyword argument `num_envs: int`, this Callable is called
            and its result (expected as a `gym.vector.VectorEnv` instance)
            is used as the environment.
        network: A network structure string, or a Callable (which can be
            a class inheriting from `torch.nn.Module`, or a function
            which returns a `torch.nn.Module` instance), or an instance
            of `torch.nn.Module`.
            The object provided here determines the structure of the
            neural network whose parameters will be evolved.
            A network structure string is a string which can be processed
            by `evotorch.neuroevolution.net.str_to_net(...)`.
            Please see the documentation of the function
            `evotorch.neuroevolution.net.str_to_net(...)` to see how such
            a neural network structure string looks like.
            Note that this network can be a recurrent network.
            When the network's `forward(...)` method can optionally accept
            an additional positional argument for the hidden state of the
            network and returns an additional value for its next state,
            then the policy is treated as a recurrent one.
            When the network is given as a callable object (e.g.
            a subclass of `nn.Module` or a function) and this callable
            object is decorated via `evotorch.decorators.pass_info`,
            the following keyword arguments will be passed:
            (i) `obs_length` (the length of the observation vector),
            (ii) `act_length` (the length of the action vector),
            (iii) `obs_shape` (the shape tuple of the observation space),
            (iv) `act_shape` (the shape tuple of the action space),
            (v) `obs_space` (the Box object specifying the observation
            space, and
            (vi) `act_space` (the Box object specifying the action
            space). Note that `act_space` will always be given as a
            `gym.spaces.Box` instance, even when the actual gym
            environment has a discrete action space. This because
            `VecGymNE` always expects the neural network to return
            a tensor of floating-point numbers.
        env_config: Keyword arguments to pass to the environment while
            it is being created.
        max_num_envs: Maximum number of environments to be instantiated.
            By default, this is None, which means that the number of
            environments can go up to the population size (or up to the
            number of solutions that a remote actor receives, if the
            problem object is configured to have parallelization).
            For situations where the current reinforcement learning task
            requires large amount of resources (e.g. memory), allocating
            environments as much as the number of solutions might not
            be feasible. In such cases, one can set `max_num_envs` as an
            integer to bring an upper bound (in total, across all the
            remote actors, for when the problem is parallelized) to how
            many environments can be allocated.
        network_args: Any additional keyword argument to be used when
            instantiating the network can be specified via `network_args`
            as a dictionary. If there are no such additional keyword
            arguments, then `network_args` can be left as None.
            Note that the argument `network_args` is expected to be None
            when the network is specified as a `torch.nn.Module` instance.
        observation_normalization: Whether or not online normalization
            will be done on the encountered observations.
        decrease_rewards_by: If given as a float, each reward will be
            decreased by this amount. For example, if the environment's
            reward function has a constant "alive bonus" (i.e. a bonus
            that is constantly added onto the reward as long as the
            agent is alive), and if you wish to negate this bonus,
            you can set `decrease_rewards_by` to this bonus amount,
            and the bonus will be nullified.
            If you do not wish to affect the rewards in this manner,
            keep this as None.
        alive_bonus_schedule: Use this to add a customized amount of
            alive bonus.
            If left as None (which is the default), additional alive
            bonus will not be added.
            If given as a tuple `(t, b)`, an alive bonus `b` will be
            added onto all the rewards beyond the timestep `t`.
            If given as a tuple `(t0, t1, b)`, a partial (linearly
            increasing towards `b`) alive bonus will be added onto
            all the rewards between the timesteps `t0` and `t1`,
            and a full alive bonus (which equals to `b`) will be added
            onto all the rewards beyond the timestep `t1`.
        action_noise_stdev: If given as a real number `s`, then, for
            each generated action, Gaussian noise with standard
            deviation `s` will be sampled, and then this sampled noise
            will be added onto the action.
            If action noise is not desired, then this argument can be
            left as None.
            For sampling the noise, the global random number generator
            of PyTorch on the simulator's device will be used.
        num_episodes: Number of episodes over which each policy will
            be evaluated. The default is 1.
        device: The device in which the population will be kept.
            If you wish to do a single-GPU evolution, we recommend
            to set this as "cuda" (or "cuda:0", or "cuda:1", etc.),
            assuming that the simulator will also instantiate itself
            on that same device.
            Alternatively, if you wish to do a multi-GPU evolution,
            we recommend to leave this as None or set this as "cpu",
            so that the main population will be kept on the cpu
            and the remote actors will perform their evaluations on
            the GPUs that are assigned to them.
        num_actors: Number of actors to create for parallelized
            evaluation of the solutions.
            Certain string values are also accepted.
            When given as "max" or as "num_cpus", the number of actors
            will be equal to the number of all available CPUs in the ray
            cluster.
            When given as "num_gpus", the number of actors will be
            equal to the number of all available GPUs in the ray
            cluster, and each actor will be assigned a GPU.
            When given as "num_devices", the number of actors will be
            equal to the minimum among the number of CPUs and the number
            of GPUs available in the cluster (or will be equal to the
            number of CPUs if there is no GPU), and each actor will be
            assigned a GPU (if available).
            If `num_actors` is given as "num_gpus" or "num_devices",
            the argument `num_gpus_per_actor` must not be used,
            and the `actor_config` dictionary must not contain the
            key "num_gpus".
            If `num_actors` is given as something other than "num_gpus"
            or "num_devices", and if you wish to assign GPUs to each
            actor, then please see the argument `num_gpus_per_actor`.
        num_gpus_per_actor: Number of GPUs to be assigned to each
            actor. This can be an integer or a float (for when you
            wish to assign fractional amounts of GPUs to actors).
            When `num_actors` has the special value "num_devices",
            the argument `num_gpus_per_actor` is expected to be left as
            None.
        num_subbatches: For when there are multiple actors, you can
            set this to an integer n if you wish the population
            to be divided exactly into n sub-batches. The actors, as they
            finish their currently assigned sub-batch of solutions,
            will pick the next un-evaluated sub-batch.
            If you specify too large numbers for this argument, then
            each sub-batch will be smaller.
            When working with vectorized simulators on GPU, having too
            many and too small sub-batches can hurt the performance.
            This argument can be left as None, in which case, assuming
            that `subbatch_size` is also None, the population will be
            split to m sub-batches, m being the number of actors.
        subbatch_size: For when there are multiple actors, you can
            set this to an integer n if you wish the population to be
            divided into sub-batches in such a way that each sub-batch
            will consist of exactly n solutions. The actors, as they
            finish their currently assigned sub-batch of solutions,
            will pick the next un-evaluated sub-batch.
            If you specify too small numbers for this argument, then
            there will be many sub-batches, each sub-batch having a
            small number of solutions.
            When working with vectorized simulators on GPU, having too
            many and too small sub-batches can hurt the performance.
            This argument can be left as None, in which case, assuming
            that `num_subbatches` is also None, the population will be
            split to m sub-batches, m being the number of actors.
        actor_config: Additional configuration to be used when creating
            each actor with the help of `ray` library.
            Can be left as None if additional configuration is not needed.
    """

    # Store the string or the Callable that will be used to generate the reinforcement learning environment.
    self._env_maker = env

    # Declare the variable which will store the environment.
    self._env: Optional[TorchWrapper] = None

    # Declare the variable which will store the batch size of the vectorized environment.
    self._num_envs: Optional[int] = None

    # Store the upper bound (if any) regarding how many environments can exist at the same time.
    self._max_num_envs: Optional[int] = None if max_num_envs is None else int(max_num_envs)

    # Actor-specific upper bound regarding how many environments can exist at the same time.
    # This variable will be filled by the `_parallelize(...)` method.
    self._actor_max_num_envs: Optional[int] = None

    # Declare the variable which stores whether or not we properly initialized the `_actor_max_num_envs` variable.
    self._actor_max_num_envs_ready: bool = False

    # Store the additional configurations to be used as keyword arguments while instantiating the environment.
    self._env_config: dict = {} if env_config is None else dict(env_config)

    # Declare the variable that will store the device of the simulator.
    # This variable will be filled when the first observation is received from the environment.
    # The device of the observation array received from the environment will determine the value of this variable.
    self._simulator_device: Optional[torch.device] = None

    # Store the neural network architecture (that might be a string or an `nn.Module` instance).
    self._architecture = network

    if network_args is None:
        # If `network_args` is given as None, change it to an empty dictionary
        network_args = {}

    if isinstance(network, str):
        # If the network is given as a string, then we will need the values for the constants `obs_length`,
        # `act_length`, and `obs_space`. To obtain those values, we use our helper function
        # `_env_constants_for_str_net(...)` which temporarily instantiates the specified environment and returns
        # its needed constants.
        env_constants = _env_constants_for_str_net(self._env_maker, **(self._env_config))
    elif isinstance(network, nn.Module):
        # If the network is an already instantiated nn.Module, then we do not prepare any pre-defined constants.
        env_constants = {}
    else:
        # If the network is given as a Callable, then we will need the values for the constants `obs_length`,
        # `act_length`, and `obs_space`. To obtain those values, we use our helper function
        # `_env_constants_for_callable_net(...)` which temporarily instantiates the specified environment and
        # returns its needed constants.
        env_constants = _env_constants_for_callable_net(self._env_maker, **(self._env_config))

    # Build a `Policy` instance according to the given architecture, and store it.
    if isinstance(network, str):
        instantiated_net = str_to_net(network, **{**env_constants, **network_args})
    elif isinstance(network, nn.Module):
        instantiated_net = network
    else:
        instantiated_net = pass_info_if_needed(network, env_constants)(**network_args)
    self._policy = Policy(instantiated_net)

    # Store the boolean which indicates whether or not there will be observation normalization.
    self._observation_normalization = bool(observation_normalization)

    # Declare the variables that will store the observation-related stats if observation normalization is enabled.
    self._obs_stats: Optional[RunningNorm] = None
    self._collected_stats: Optional[RunningNorm] = None

    # Store the number of episodes configuration given by the user.
    self._num_episodes = int(num_episodes)

    # Store the `decrease_rewards_by` configuration given by the user.
    self._decrease_rewards_by = None if decrease_rewards_by is None else float(decrease_rewards_by)

    if alive_bonus_schedule is None:
        # If `alive_bonus_schedule` argument is None, then we store it as None as well.
        self._alive_bonus_schedule = None
    else:
        # This is the case where the user has specified an `alive_bonus_schedule`.
        alive_bonus_schedule = list(alive_bonus_schedule)
        alive_bonus_schedule_length = len(alive_bonus_schedule)
        if alive_bonus_schedule_length == 2:
            # If `alive_bonus_schedule` was given as a 2-element sequence (t, b), then store it as (t, t, b).
            # This means that the partial alive bonus time window starts and ends at t, therefore, there will
            # be no alive bonus until t, and beginning with t, there will be full alive bonus.
            self._alive_bonus_schedule = [
                int(alive_bonus_schedule[0]),
                int(alive_bonus_schedule[0]),
                float(alive_bonus_schedule[1]),
            ]
        elif alive_bonus_schedule_length == 3:
            # If `alive_bonus_schedule` was given as a 3-element sequence (t0, t1, b), then store those 3
            # elements.
            self._alive_bonus_schedule = [
                int(alive_bonus_schedule[0]),
                int(alive_bonus_schedule[1]),
                float(alive_bonus_schedule[2]),
            ]
        else:
            # `alive_bonus_schedule` sequences with unrecognized lengths trigger an error.
            raise ValueError(
                f"Received invalid number elements as the alive bonus schedule."
                f" Expected 2 or 3 items, but got these: {self._alive_bonus_schedule}"
                f" (having a length of {len(self._alive_bonus_schedule)})."
            )

    # If `action_noise_stdev` is specified, store it.
    self._action_noise_stdev = None if action_noise_stdev is None else float(action_noise_stdev)

    # Initialize the counters for the number of simulator interactions and the number of episodes.
    self._interaction_count: int = 0
    self._episode_count: int = 0

    # Call the superclass
    super().__init__(
        objective_sense="max",
        initial_bounds=(-0.00001, 0.00001),
        solution_length=self._policy.parameter_length,
        device=device,
        dtype=torch.float32,
        num_actors=num_actors,
        num_gpus_per_actor=num_gpus_per_actor,
        actor_config=actor_config,
        num_subbatches=num_subbatches,
        subbatch_size=subbatch_size,
    )

get_env(self)

Get the gym environment.

Returns:

Type Description
Optional[gymnasium.core.Env]

The gym environment if it is built. If not built yet, None.

Source code in evotorch/neuroevolution/vecgymne.py
def get_env(self) -> Optional[gym.Env]:
    """
    Get the gym environment.

    Returns:
        The gym environment if it is built. If not built yet, None.
    """
    return self._env

get_observation_stats(self)

Get the observation stats

Source code in evotorch/neuroevolution/vecgymne.py
def get_observation_stats(self) -> RunningNorm:
    """Get the observation stats"""
    self._ensure_obsnorm()
    return self._obs_stats

make_net(self, solution)

Make a new policy network parameterized by the given solution. Note that this parameterized network assumes that the observation is already normalized, and it does not do action clipping to ensure that the generated actions are within valid bounds.

To have a policy network which has its own observation normalization and action clipping layers, please see the method to_policy(...).

Parameters:

Name Type Description Default
solution Iterable

The solution which stores the parameters. This can be a Solution instance, or a 1-dimensional tensor, or any Iterable of real numbers.

required

Returns:

Type Description
Module

The policy network, as a PyTorch module.

Source code in evotorch/neuroevolution/vecgymne.py
def make_net(self, solution: Iterable) -> nn.Module:
    """
    Make a new policy network parameterized by the given solution.
    Note that this parameterized network assumes that the observation
    is already normalized, and it does not do action clipping to ensure
    that the generated actions are within valid bounds.

    To have a policy network which has its own observation normalization
    and action clipping layers, please see the method `to_policy(...)`.

    Args:
        solution: The solution which stores the parameters.
            This can be a Solution instance, or a 1-dimensional tensor,
            or any Iterable of real numbers.
    Returns:
        The policy network, as a PyTorch module.
    """
    return self.to_policy(solution, with_wrapper_modules=False)

pop_observation_stats(self)

Get and clear the collected observation stats

Source code in evotorch/neuroevolution/vecgymne.py
def pop_observation_stats(self) -> RunningNorm:
    """Get and clear the collected observation stats"""
    self._ensure_obsnorm()
    result = self._collected_stats
    self._collected_stats = None
    return result

save_solution(self, solution, fname)

Save the solution into a pickle file. Among the saved data within the pickle file are the solution (as a PyTorch tensor), the policy (as a torch.nn.Module instance), and observation stats (if any).

Parameters:

Name Type Description Default
solution Iterable

The solution to be saved. This can be a PyTorch tensor, a Solution instance, or any Iterable.

required
fname Union[str, pathlib.Path]

The file name of the pickle file to be created.

required
Source code in evotorch/neuroevolution/vecgymne.py
def save_solution(self, solution: Iterable, fname: Union[str, Path]):
    """
    Save the solution into a pickle file.
    Among the saved data within the pickle file are the solution
    (as a PyTorch tensor), the policy (as a `torch.nn.Module` instance),
    and observation stats (if any).

    Args:
        solution: The solution to be saved. This can be a PyTorch tensor,
            a `Solution` instance, or any `Iterable`.
        fname: The file name of the pickle file to be created.
    """

    # Convert the solution to a PyTorch tensor on the cpu.
    if isinstance(solution, torch.Tensor):
        solution = solution.to("cpu")
    elif isinstance(solution, Solution):
        solution = solution.values.clone().to("cpu")
    else:
        solution = torch.as_tensor(solution, dtype=torch.float32, device="cpu")

    if isinstance(solution, ReadOnlyTensor):
        solution = solution.as_subclass(torch.Tensor)

    # Store the solution and the policy.
    result = {
        "solution": solution,
        "policy": self.to_policy(solution),
    }

    # If available, store the observation stats.
    if self.observation_normalization and (self._obs_stats is not None):
        result["obs_mean"] = self._obs_stats.mean.to("cpu")
        result["obs_stdev"] = self._obs_stats.stdev.to("cpu")
        result["obs_sum"] = self._obs_stats.sum.to("cpu")
        result["obs_sum_of_squares"] = self._obs_stats.sum_of_squares.to("cpu")

    # Some additional data.
    result["interaction_count"] = self.interaction_count
    result["episode_count"] = self.episode_count
    result["time"] = datetime.now()

    if isinstance(self._env_maker, str):
        # If the environment was specified via a string, store the string.
        result["env"] = self._env_maker

    # Store the network architecture.
    result["architecture"] = self._architecture

    # Save the dictionary which stores the data.
    with open(fname, "wb") as f:
        pickle.dump(result, f)

set_episode_count(self, n)

Set the episode count manually.

Source code in evotorch/neuroevolution/vecgymne.py
def set_episode_count(self, n: int):
    """
    Set the episode count manually.
    """
    self._episode_count = int(n)

set_interaction_count(self, n)

Set the interaction count manually.

Source code in evotorch/neuroevolution/vecgymne.py
def set_interaction_count(self, n: int):
    """
    Set the interaction count manually.
    """
    self._interaction_count = int(n)

set_observation_stats(self, rn)

Set the observation stats

Source code in evotorch/neuroevolution/vecgymne.py
def set_observation_stats(self, rn: RunningNorm):
    """Set the observation stats"""
    self._ensure_obsnorm()
    self._obs_stats = rn

to_policy(self, solution, *, with_wrapper_modules=True)

Convert the given solution to a policy.

Parameters:

Name Type Description Default
solution Iterable

A solution which can be given as a torch.Tensor, as a Solution, or as any Iterable.

required
with_wrapper_modules bool

Whether or not to wrap the policy module with helper modules so that observations are normalized and actions are clipped to be within the correct boundaries. The default and the recommended value is True.

True

Returns:

Type Description
Module

The policy, as a torch.nn.Module instance.

Source code in evotorch/neuroevolution/vecgymne.py
def to_policy(self, solution: Iterable, *, with_wrapper_modules: bool = True) -> nn.Module:
    """
    Convert the given solution to a policy.

    Args:
        solution: A solution which can be given as a `torch.Tensor`, as a
            `Solution`, or as any `Iterable`.
        with_wrapper_modules: Whether or not to wrap the policy module
            with helper modules so that observations are normalized
            and actions are clipped to be within the correct boundaries.
            The default and the recommended value is True.
    Returns:
        The policy, as a `torch.nn.Module` instance.
    """
    # Get the gym environment
    env = self._get_env(1)

    # Get the observation space, its lower and higher bounds.
    obs_space = env.single_action_space
    low = obs_space.low
    high = obs_space.high

    # If the lower and higher bounds are not -inf and +inf respectively, then this environment needs clipping.
    needs_clipping = _numpy_arrays_specify_bounds(low, high)

    # Convert the solution to a PyTorch tensor on cpu.
    if isinstance(solution, torch.Tensor):
        solution = solution.to("cpu")
    elif isinstance(solution, Solution):
        solution = solution.values.clone().to("cpu")
    else:
        solution = torch.as_tensor(solution, dtype=torch.float32, device="cpu")

    # Convert the internally stored policy to a PyTorch module.
    result = self._policy.to_torch_module(solution)

    if with_wrapper_modules:
        if self.observation_normalization and (self._obs_stats is not None):
            # If observation normalization is needed and there are collected observation stats, then we wrap the
            # policy with an ObsNormWrapperModule.
            result = ObsNormWrapperModule(result, self._obs_stats)

        if needs_clipping:
            # If clipping is needed, then we wrap the policy with an ActClipWrapperModule
            result = ActClipWrapperModule(result, obs_space)

    return result

update_observation_stats(self, rn)

Update the observation stats via another RunningNorm instance

Source code in evotorch/neuroevolution/vecgymne.py
def update_observation_stats(self, rn: RunningNorm):
    """Update the observation stats via another RunningNorm instance"""
    self._ensure_obsnorm()
    if self._obs_stats is None:
        self._obs_stats = rn
    else:
        self._obs_stats.update(rn)