Skip to content

Vecrl

This namespace provides various vectorized reinforcement learning utilities.

BaseVectorEnv

Bases: VectorEnv

A base class for vectorized gymnasium environments.

In gymnasium 0.29.x, the __init__(...) method of the base class gymnasium.vector.VectorEnv expects the arguments num_envs, observation_space, and action_space, and then prepares the instance attributes num_envs, single_observation_space, single_action_space, observation_space, and action_space according to the initialization arguments it receives.

It appears that with gymnasium 1.x, this API is changing, and gymnasium.vector.VectorEnv strictly expects no positional arguments. This BaseVectorEnv class is meant as a base class which preserves the behavior of gymnasium 0.29.x, meaning that it will expects the arguments, and prepare the attributes mentioned above.

Please note, however, that this BaseVectorEnv implementation can only work with environments whose single observation and single action spaces are either Box or Discrete.

Source code in evotorch/neuroevolution/net/vecrl.py
class BaseVectorEnv(gym.vector.VectorEnv):
    """
    A base class for vectorized gymnasium environments.

    In gymnasium 0.29.x, the `__init__(...)` method of the base class
    `gymnasium.vector.VectorEnv` expects the arguments `num_envs`,
    `observation_space`, and `action_space`, and then prepares the instance
    attributes `num_envs`, `single_observation_space`, `single_action_space`,
    `observation_space`, and `action_space` according to the initialization
    arguments it receives.

    It appears that with gymnasium 1.x, this API is changing, and
    `gymnasium.vector.VectorEnv` strictly expects no positional arguments.
    This `BaseVectorEnv` class is meant as a base class which preserves
    the behavior of gymnasium 0.29.x, meaning that it will expects the
    arguments, and prepare the attributes mentioned above.

    Please note, however, that this `BaseVectorEnv` implementation
    can only work with environments whose single observation and single
    action spaces are either `Box` or `Discrete`.
    """

    def __init__(self, num_envs: int, observation_space: Space, action_space: Space):
        """
        `__init__(...)`: Initialize the vectorized environment.

        Args:
            num_envs: Number of sub-environments handled by this `BaseVectorEnv`.
            observation_space: Observation space of a single sub-environment.
                This can only be given as an instance of type
                `gymnasium.spaces.Box` or `gymnasium.spaces.Discrete`.
            action_space: Action space of a single sub-environment.
                This can only be given as an instance of type
                `gymnasium.spaces.Box` or `gymnasium.spaces.Discrete`.
        """
        super().__init__()
        self.num_envs = int(num_envs)
        self.single_observation_space = observation_space
        self.single_action_space = action_space
        self.observation_space = _batch_space(self.single_observation_space, self.num_envs)
        self.action_space = _batch_space(self.single_action_space, self.num_envs)

__init__(num_envs, observation_space, action_space)

__init__(...): Initialize the vectorized environment.

Parameters:

Name Type Description Default
num_envs int

Number of sub-environments handled by this BaseVectorEnv.

required
observation_space Space

Observation space of a single sub-environment. This can only be given as an instance of type gymnasium.spaces.Box or gymnasium.spaces.Discrete.

required
action_space Space

Action space of a single sub-environment. This can only be given as an instance of type gymnasium.spaces.Box or gymnasium.spaces.Discrete.

required
Source code in evotorch/neuroevolution/net/vecrl.py
def __init__(self, num_envs: int, observation_space: Space, action_space: Space):
    """
    `__init__(...)`: Initialize the vectorized environment.

    Args:
        num_envs: Number of sub-environments handled by this `BaseVectorEnv`.
        observation_space: Observation space of a single sub-environment.
            This can only be given as an instance of type
            `gymnasium.spaces.Box` or `gymnasium.spaces.Discrete`.
        action_space: Action space of a single sub-environment.
            This can only be given as an instance of type
            `gymnasium.spaces.Box` or `gymnasium.spaces.Discrete`.
    """
    super().__init__()
    self.num_envs = int(num_envs)
    self.single_observation_space = observation_space
    self.single_action_space = action_space
    self.observation_space = _batch_space(self.single_observation_space, self.num_envs)
    self.action_space = _batch_space(self.single_action_space, self.num_envs)

Policy

A Policy for deciding the actions for a reinforcement learning environment.

This can be seen as a stateful wrapper around a PyTorch module.

Let us assume that we have the following PyTorch module:

from torch import nn

net = nn.Linear(5, 8)

which has 48 parameters (when all the parameters are flattened). Let us randomly generate a parameter vector for our module net:

parameters = torch.randn(48)

We can now prepare a policy:

policy = Policy(net)
policy.set_parameters(parameters)

If we generate a random observation:

observation = torch.randn(5)

We can receive our action as follows:

action = policy(observation)

If the PyTorch module that we wish to wrap is a recurrent network (i.e. a network which expects an optional second argument for the hidden state, and returns a second value which represents the updated hidden state), then, the hidden state is automatically managed by the Policy instance.

Let us assume that we have a recurrent network named recnet.

policy = Policy(recnet)
policy.set_parameters(parameters_of_recnet)

In this case, because the hidden state of the network is internally managed, the usage is still the same with our previous non-recurrent example:

action = policy(observation)

When using a recurrent module on multiple episodes, it is important to reset the hidden state of the network. This is achieved by the reset method:

policy.reset()
action1 = policy(observation1)

# action2 will be computed with the hidden state generated by the
# previous forward-pass.
action2 = policy(observation2)

policy.reset()

# action3 will be computed according to the renewed hidden state.
action3 = policy(observation3)

Both for non-recurrent and recurrent networks, it is possible to perform vectorized operations. For now, let us return to our first non-recurrent example:

net = nn.Linear(5, 8)

Instead of generating only one parameter vector, we now generate a batch of parameter vectors. Let us say that our batch size is 10:

batch_of_parameters = torch.randn(10, 48)

Like we did in the non-batched examples, we can do:

policy = Policy(net)
policy.set_parameters(batch_of_parameters)

Because we are now in the batched mode, policy now expects a batch of observations and will return a batch of actions:

batch_of_observations = torch.randn(10, 5)
batch_of_actions = policy(batch_of_observations)

When doing vectorized reinforcement learning with a recurrent module, it can be the case that only some of the environments are finished, and therefore it is necessary to reset the hidden states associated with those environments only. The reset(...) method of Policy has a second argument to specify which of the recurrent network instances are to be reset. For example, if the episodes of the environments with indices 2 and 5 are about to restart (and therefore we wish to reset the states of the networks with indices 2 and 5), then, we can do:

policy.reset(torch.tensor([2, 5]))
Source code in evotorch/neuroevolution/net/vecrl.py
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
class Policy:
    """
    A Policy for deciding the actions for a reinforcement learning environment.

    This can be seen as a stateful wrapper around a PyTorch module.

    Let us assume that we have the following PyTorch module:

    ```python
    from torch import nn

    net = nn.Linear(5, 8)
    ```

    which has 48 parameters (when all the parameters are flattened).
    Let us randomly generate a parameter vector for our module `net`:

    ```python
    parameters = torch.randn(48)
    ```

    We can now prepare a policy:

    ```python
    policy = Policy(net)
    policy.set_parameters(parameters)
    ```

    If we generate a random observation:

    ```python
    observation = torch.randn(5)
    ```

    We can receive our action as follows:

    ```python
    action = policy(observation)
    ```

    If the PyTorch module that we wish to wrap is a recurrent network (i.e.
    a network which expects an optional second argument for the hidden state,
    and returns a second value which represents the updated hidden state),
    then, the hidden state is automatically managed by the Policy instance.

    Let us assume that we have a recurrent network named `recnet`.

    ```python
    policy = Policy(recnet)
    policy.set_parameters(parameters_of_recnet)
    ```

    In this case, because the hidden state of the network is internally
    managed, the usage is still the same with our previous non-recurrent
    example:

    ```python
    action = policy(observation)
    ```

    When using a recurrent module on multiple episodes, it is important
    to reset the hidden state of the network. This is achieved by the
    reset method:

    ```python
    policy.reset()
    action1 = policy(observation1)

    # action2 will be computed with the hidden state generated by the
    # previous forward-pass.
    action2 = policy(observation2)

    policy.reset()

    # action3 will be computed according to the renewed hidden state.
    action3 = policy(observation3)
    ```

    Both for non-recurrent and recurrent networks, it is possible to
    perform vectorized operations. For now, let us return to our
    first non-recurrent example:

    ```python
    net = nn.Linear(5, 8)
    ```

    Instead of generating only one parameter vector, we now generate
    a batch of parameter vectors. Let us say that our batch size is 10:

    ```python
    batch_of_parameters = torch.randn(10, 48)
    ```

    Like we did in the non-batched examples, we can do:

    ```python
    policy = Policy(net)
    policy.set_parameters(batch_of_parameters)
    ```

    Because we are now in the batched mode, `policy` now expects a batch
    of observations and will return a batch of actions:

    ```python
    batch_of_observations = torch.randn(10, 5)
    batch_of_actions = policy(batch_of_observations)
    ```

    When doing vectorized reinforcement learning with a recurrent module,
    it can be the case that only some of the environments are finished,
    and therefore it is necessary to reset the hidden states associated
    with those environments only. The `reset(...)` method of Policy
    has a second argument to specify which of the recurrent network
    instances are to be reset. For example, if the episodes of the
    environments with indices 2 and 5 are about to restart (and therefore
    we wish to reset the states of the networks with indices 2 and 5),
    then, we can do:

    ```python
    policy.reset(torch.tensor([2, 5]))
    ```
    """

    def __init__(self, net: Union[str, Callable, nn.Module], **kwargs):
        """
        `__init__(...)`: Initialize the Policy.

        Args:
            net: The network to be wrapped by the Policy object.
                This can be a string, a Callable (e.g. a `torch.nn.Module`
                subclass), or a `torch.nn.Module` instance.
                When this argument is a string, the network will be
                created with the help of the function
                `evotorch.neuroevolution.net.str_to_net(...)` and then
                wrapped. Please see the `str_to_net(...)` function's
                documentation for details regarding how a network structure
                can be expressed via strings.
            kwargs: Expected in the form of additional keyword arguments,
                these keyword arguments will be passed to the provided
                Callable object (if the argument `net` is a Callable)
                or to `str_to_net(...)` (if the argument `net` is a string)
                at the moment of generating the network.
                If the argument `net` is a `torch.nn.Module` instance,
                having any additional keyword arguments will trigger an
                error, because the network is already instantiated and
                therefore, it is not possible to pass these keyword arguments.
        """
        from ..net import str_to_net
        from ..net.functional import ModuleExpectingFlatParameters, make_functional_module

        if isinstance(net, str):
            self.__module = str_to_net(net, **kwargs)
        elif isinstance(net, nn.Module):
            if len(kwargs) > 0:
                raise ValueError(
                    f"When the network is given as an `nn.Module` instance, extra network arguments cannot be used"
                    f" (because the network is already instantiated)."
                    f" However, these extra keyword arguments were received: {kwargs}."
                )
            self.__module = net
        elif isinstance(net, Callable):
            self.__module = net(**kwargs)
        else:
            raise TypeError(
                f"The class `Policy` expected a string or an `nn.Module` instance, or a Callable, but received {net}"
                f" (whose type is {type(net)})."
            )

        self.__fmodule: ModuleExpectingFlatParameters = make_functional_module(self.__module)
        self.__state: Any = None
        self.__parameters: Optional[torch.Tensor] = None

    def set_parameters(self, parameters: torch.Tensor, indices: Optional[MaskOrIndices] = None, *, reset: bool = True):
        """
        Set the parameters of the policy.

        Args:
            parameters: A 1-dimensional or a 2-dimensional tensor containing
                the flattened parameters to be used with the neural network.
                If the given parameters are two-dimensional, then, given that
                the leftmost size of the parameter tensor is `n`, the
                observations will be expected in a batch with leftmost size
                `n`, and the returned actions will also be in a batch,
                again with the leftmost size `n`.
            indices: For when the parameters were previously given via a
                2-dimensional tensor, provide this argument if you would like
                to change only some rows of the previously given parameters.
                For example, if `indices` is given as `torch.tensor([2, 4])`
                and the argument `parameters` is given as a 2-dimensional
                tensor with leftmost size 2, then the rows with indices
                2 and 4 will be replaced by these new parameters provided
                via the argument `parameters`.
            reset: If given as True, the hidden states of the networks whose
                parameters just changed will be reset. If `indices` was not
                provided at all, then this means that the parameters of all
                networks are modified, in which case, all the hidden states
                will be reset.
                If given as False, no such resetting will be done.
        """
        if self.__parameters is None:
            if indices is not None:
                raise ValueError(
                    "The argument `indices` can be used only if network parameters were previously specified."
                    " However, it seems that the method `set_parameters(...)` was not called before."
                )
            self.__parameters = parameters
        else:
            if indices is None:
                self.__parameters = parameters
            else:
                self.__parameters[indices] = parameters

        if reset:
            self.reset(indices)

    def __call__(self, x: torch.Tensor) -> torch.Tensor:
        """
        Pass the given observations through the network.

        Args:
            x: The observations, as a PyTorch tensor.
                If the parameters were given (via the method
                `set_parameters(...)`) as a 1-dimensional tensor, then this
                argument is expected to store a single observation.
                If the parameters were given as a 2-dimensional tensor,
                then, this argument is expected to store a batch of
                observations, and the leftmost size of this observation
                tensor must match with the leftmost size of the parameter
                tensor.
        Returns:
            The output tensor, which represents the action to take.
        """
        if self.__parameters is None:
            raise ValueError("Please use the method `set_parameters(...)` before calling the policy.")

        if self.__state is None:
            further_args = (x,)
        else:
            further_args = (x, self.__state)

        parameters = self.__parameters
        ndim = parameters.ndim
        if ndim == 1:
            result = self.__fmodule(parameters, *further_args)
        elif ndim == 2:
            vmapped = vmap(self.__fmodule)
            result = vmapped(parameters, *further_args)
        else:
            raise ValueError(
                f"Expected the parameters as a 1 or 2 dimensional tensor."
                f" However, the received parameters tensor has {ndim} dimensions."
            )

        if isinstance(result, torch.Tensor):
            return result
        elif isinstance(result, tuple):
            result, state = result
            self.__state = state
            return result
        else:
            raise TypeError(f"The torch module used by the Policy returned an unexpected object: {result}")

    def reset(self, indices: Optional[MaskOrIndices] = None, *, copy: bool = True):
        """
        Reset the hidden states, if the contained module is a recurrent network.

        Args:
            indices: Optionally a sequence of integers or a sequence of
                booleans, specifying which networks' states will be
                reset. If left as None, then the states of all the networks
                will be reset.
            copy: When `indices` is given as something other than None,
                if `copy` is given as True, then the resetting will NOT
                be done in-place. Instead, a new copy of the hidden state
                will first be created, and then the specified regions
                of this new copy will be cleared, and then finally this
                modified copy will be declared as the new hidden state.
                It is a common practice for recurrent neural network
                implementations to return the same tensor both as its
                output and as (part of) its hidden state. With `copy=False`,
                the resetting would be done in-place, and the action
                tensor could be involuntarily reset as well.
                This in-place modification could cause silent bugs
                if the unintended modification on the action tensor
                happens BEFORE the action is sent to the reinforcement
                learning environment.
                To prevent such situations, the default value for the argument
                `copy` is True.
        """
        if indices is None:
            self.__state = None
        else:
            if self.__state is not None:
                with torch.no_grad():
                    if copy:
                        self.__state = deepcopy(self.__state)
                    reset_tensors(self.__state, indices)

    @property
    def parameters(self) -> torch.Tensor:
        """
        The currently used parameters.
        """
        return self.__parameters

    @property
    def h(self) -> Optional[torch.Tensor]:
        """
        The hidden state of the contained recurrent network, if any.

        If the contained recurrent network did not generate a hidden state
        yet, or if the contained network is not recurrent, then the result
        will be None.
        """
        return self.__state

    @property
    def parameter_length(self) -> int:
        """
        Length of the parameter tensor.
        """
        return self.__fmodule.parameter_length

    @property
    def wrapped_module(self) -> nn.Module:
        """
        The wrapped `torch.nn.Module` instance.
        """
        return self.__module

    def to_torch_module(self, parameter_vector: torch.Tensor) -> nn.Module:
        """
        Get a copy of the contained network, parameterized as specified.

        Args:
            parameter_vector: The parameters to be used by the new network.
        Returns:
            Copy of the contained network, as a `torch.nn.Module` instance.
        """
        with torch.no_grad():
            net = deepcopy(self.__module).to(parameter_vector.device)
            nnu.vector_to_parameters(parameter_vector, net.parameters())
        return net

h property

The hidden state of the contained recurrent network, if any.

If the contained recurrent network did not generate a hidden state yet, or if the contained network is not recurrent, then the result will be None.

parameter_length property

Length of the parameter tensor.

parameters property

The currently used parameters.

wrapped_module property

The wrapped torch.nn.Module instance.

__call__(x)

Pass the given observations through the network.

Parameters:

Name Type Description Default
x Tensor

The observations, as a PyTorch tensor. If the parameters were given (via the method set_parameters(...)) as a 1-dimensional tensor, then this argument is expected to store a single observation. If the parameters were given as a 2-dimensional tensor, then, this argument is expected to store a batch of observations, and the leftmost size of this observation tensor must match with the leftmost size of the parameter tensor.

required
Source code in evotorch/neuroevolution/net/vecrl.py
def __call__(self, x: torch.Tensor) -> torch.Tensor:
    """
    Pass the given observations through the network.

    Args:
        x: The observations, as a PyTorch tensor.
            If the parameters were given (via the method
            `set_parameters(...)`) as a 1-dimensional tensor, then this
            argument is expected to store a single observation.
            If the parameters were given as a 2-dimensional tensor,
            then, this argument is expected to store a batch of
            observations, and the leftmost size of this observation
            tensor must match with the leftmost size of the parameter
            tensor.
    Returns:
        The output tensor, which represents the action to take.
    """
    if self.__parameters is None:
        raise ValueError("Please use the method `set_parameters(...)` before calling the policy.")

    if self.__state is None:
        further_args = (x,)
    else:
        further_args = (x, self.__state)

    parameters = self.__parameters
    ndim = parameters.ndim
    if ndim == 1:
        result = self.__fmodule(parameters, *further_args)
    elif ndim == 2:
        vmapped = vmap(self.__fmodule)
        result = vmapped(parameters, *further_args)
    else:
        raise ValueError(
            f"Expected the parameters as a 1 or 2 dimensional tensor."
            f" However, the received parameters tensor has {ndim} dimensions."
        )

    if isinstance(result, torch.Tensor):
        return result
    elif isinstance(result, tuple):
        result, state = result
        self.__state = state
        return result
    else:
        raise TypeError(f"The torch module used by the Policy returned an unexpected object: {result}")

__init__(net, **kwargs)

__init__(...): Initialize the Policy.

Parameters:

Name Type Description Default
net Union[str, Callable, Module]

The network to be wrapped by the Policy object. This can be a string, a Callable (e.g. a torch.nn.Module subclass), or a torch.nn.Module instance. When this argument is a string, the network will be created with the help of the function evotorch.neuroevolution.net.str_to_net(...) and then wrapped. Please see the str_to_net(...) function's documentation for details regarding how a network structure can be expressed via strings.

required
kwargs

Expected in the form of additional keyword arguments, these keyword arguments will be passed to the provided Callable object (if the argument net is a Callable) or to str_to_net(...) (if the argument net is a string) at the moment of generating the network. If the argument net is a torch.nn.Module instance, having any additional keyword arguments will trigger an error, because the network is already instantiated and therefore, it is not possible to pass these keyword arguments.

{}
Source code in evotorch/neuroevolution/net/vecrl.py
def __init__(self, net: Union[str, Callable, nn.Module], **kwargs):
    """
    `__init__(...)`: Initialize the Policy.

    Args:
        net: The network to be wrapped by the Policy object.
            This can be a string, a Callable (e.g. a `torch.nn.Module`
            subclass), or a `torch.nn.Module` instance.
            When this argument is a string, the network will be
            created with the help of the function
            `evotorch.neuroevolution.net.str_to_net(...)` and then
            wrapped. Please see the `str_to_net(...)` function's
            documentation for details regarding how a network structure
            can be expressed via strings.
        kwargs: Expected in the form of additional keyword arguments,
            these keyword arguments will be passed to the provided
            Callable object (if the argument `net` is a Callable)
            or to `str_to_net(...)` (if the argument `net` is a string)
            at the moment of generating the network.
            If the argument `net` is a `torch.nn.Module` instance,
            having any additional keyword arguments will trigger an
            error, because the network is already instantiated and
            therefore, it is not possible to pass these keyword arguments.
    """
    from ..net import str_to_net
    from ..net.functional import ModuleExpectingFlatParameters, make_functional_module

    if isinstance(net, str):
        self.__module = str_to_net(net, **kwargs)
    elif isinstance(net, nn.Module):
        if len(kwargs) > 0:
            raise ValueError(
                f"When the network is given as an `nn.Module` instance, extra network arguments cannot be used"
                f" (because the network is already instantiated)."
                f" However, these extra keyword arguments were received: {kwargs}."
            )
        self.__module = net
    elif isinstance(net, Callable):
        self.__module = net(**kwargs)
    else:
        raise TypeError(
            f"The class `Policy` expected a string or an `nn.Module` instance, or a Callable, but received {net}"
            f" (whose type is {type(net)})."
        )

    self.__fmodule: ModuleExpectingFlatParameters = make_functional_module(self.__module)
    self.__state: Any = None
    self.__parameters: Optional[torch.Tensor] = None

reset(indices=None, *, copy=True)

Reset the hidden states, if the contained module is a recurrent network.

Parameters:

Name Type Description Default
indices Optional[MaskOrIndices]

Optionally a sequence of integers or a sequence of booleans, specifying which networks' states will be reset. If left as None, then the states of all the networks will be reset.

None
copy bool

When indices is given as something other than None, if copy is given as True, then the resetting will NOT be done in-place. Instead, a new copy of the hidden state will first be created, and then the specified regions of this new copy will be cleared, and then finally this modified copy will be declared as the new hidden state. It is a common practice for recurrent neural network implementations to return the same tensor both as its output and as (part of) its hidden state. With copy=False, the resetting would be done in-place, and the action tensor could be involuntarily reset as well. This in-place modification could cause silent bugs if the unintended modification on the action tensor happens BEFORE the action is sent to the reinforcement learning environment. To prevent such situations, the default value for the argument copy is True.

True
Source code in evotorch/neuroevolution/net/vecrl.py
def reset(self, indices: Optional[MaskOrIndices] = None, *, copy: bool = True):
    """
    Reset the hidden states, if the contained module is a recurrent network.

    Args:
        indices: Optionally a sequence of integers or a sequence of
            booleans, specifying which networks' states will be
            reset. If left as None, then the states of all the networks
            will be reset.
        copy: When `indices` is given as something other than None,
            if `copy` is given as True, then the resetting will NOT
            be done in-place. Instead, a new copy of the hidden state
            will first be created, and then the specified regions
            of this new copy will be cleared, and then finally this
            modified copy will be declared as the new hidden state.
            It is a common practice for recurrent neural network
            implementations to return the same tensor both as its
            output and as (part of) its hidden state. With `copy=False`,
            the resetting would be done in-place, and the action
            tensor could be involuntarily reset as well.
            This in-place modification could cause silent bugs
            if the unintended modification on the action tensor
            happens BEFORE the action is sent to the reinforcement
            learning environment.
            To prevent such situations, the default value for the argument
            `copy` is True.
    """
    if indices is None:
        self.__state = None
    else:
        if self.__state is not None:
            with torch.no_grad():
                if copy:
                    self.__state = deepcopy(self.__state)
                reset_tensors(self.__state, indices)

set_parameters(parameters, indices=None, *, reset=True)

Set the parameters of the policy.

Parameters:

Name Type Description Default
parameters Tensor

A 1-dimensional or a 2-dimensional tensor containing the flattened parameters to be used with the neural network. If the given parameters are two-dimensional, then, given that the leftmost size of the parameter tensor is n, the observations will be expected in a batch with leftmost size n, and the returned actions will also be in a batch, again with the leftmost size n.

required
indices Optional[MaskOrIndices]

For when the parameters were previously given via a 2-dimensional tensor, provide this argument if you would like to change only some rows of the previously given parameters. For example, if indices is given as torch.tensor([2, 4]) and the argument parameters is given as a 2-dimensional tensor with leftmost size 2, then the rows with indices 2 and 4 will be replaced by these new parameters provided via the argument parameters.

None
reset bool

If given as True, the hidden states of the networks whose parameters just changed will be reset. If indices was not provided at all, then this means that the parameters of all networks are modified, in which case, all the hidden states will be reset. If given as False, no such resetting will be done.

True
Source code in evotorch/neuroevolution/net/vecrl.py
def set_parameters(self, parameters: torch.Tensor, indices: Optional[MaskOrIndices] = None, *, reset: bool = True):
    """
    Set the parameters of the policy.

    Args:
        parameters: A 1-dimensional or a 2-dimensional tensor containing
            the flattened parameters to be used with the neural network.
            If the given parameters are two-dimensional, then, given that
            the leftmost size of the parameter tensor is `n`, the
            observations will be expected in a batch with leftmost size
            `n`, and the returned actions will also be in a batch,
            again with the leftmost size `n`.
        indices: For when the parameters were previously given via a
            2-dimensional tensor, provide this argument if you would like
            to change only some rows of the previously given parameters.
            For example, if `indices` is given as `torch.tensor([2, 4])`
            and the argument `parameters` is given as a 2-dimensional
            tensor with leftmost size 2, then the rows with indices
            2 and 4 will be replaced by these new parameters provided
            via the argument `parameters`.
        reset: If given as True, the hidden states of the networks whose
            parameters just changed will be reset. If `indices` was not
            provided at all, then this means that the parameters of all
            networks are modified, in which case, all the hidden states
            will be reset.
            If given as False, no such resetting will be done.
    """
    if self.__parameters is None:
        if indices is not None:
            raise ValueError(
                "The argument `indices` can be used only if network parameters were previously specified."
                " However, it seems that the method `set_parameters(...)` was not called before."
            )
        self.__parameters = parameters
    else:
        if indices is None:
            self.__parameters = parameters
        else:
            self.__parameters[indices] = parameters

    if reset:
        self.reset(indices)

to_torch_module(parameter_vector)

Get a copy of the contained network, parameterized as specified.

Parameters:

Name Type Description Default
parameter_vector Tensor

The parameters to be used by the new network.

required
Source code in evotorch/neuroevolution/net/vecrl.py
def to_torch_module(self, parameter_vector: torch.Tensor) -> nn.Module:
    """
    Get a copy of the contained network, parameterized as specified.

    Args:
        parameter_vector: The parameters to be used by the new network.
    Returns:
        Copy of the contained network, as a `torch.nn.Module` instance.
    """
    with torch.no_grad():
        net = deepcopy(self.__module).to(parameter_vector.device)
        nnu.vector_to_parameters(parameter_vector, net.parameters())
    return net

SyncVectorEnv

Bases: BaseVectorEnv

A vectorized gymnasium environment for handling multiple sub-environments.

This is an alternative implementation to the class gymnasium.vector.SyncVectorEnv. This alternative SyncVectorEnv implementation has eager auto-reset.

After taking a step(), any sub-environment whose terminated or truncated signal is True will be immediately subject to resetting, and the returned observation and info will immediately reflect the first state of the new episode. This is compatible with the auto-reset behavior of gymnasium 0.29.x, and is different from the auto-reset behavior introduced in gymnasium 1.x.

Source code in evotorch/neuroevolution/net/vecrl.py
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
class SyncVectorEnv(BaseVectorEnv):
    """
    A vectorized gymnasium environment for handling multiple sub-environments.

    This is an alternative implementation to the class `gymnasium.vector.SyncVectorEnv`.
    This alternative SyncVectorEnv implementation has _eager_ auto-reset.

    After taking a step(), any sub-environment whose terminated or truncated
    signal is True will be immediately subject to resetting, and the returned
    observation and info will immediately reflect the first state of the new
    episode. This is compatible with the auto-reset behavior of gymnasium 0.29.x,
    and is different from the auto-reset behavior introduced in gymnasium 1.x.
    """

    def __init__(
        self,
        env_makers: Iterable[gym.Env],
        *,
        empty_info: bool = False,
        num_episodes: Optional[int] = None,
        device: Optional[Union[str, torch.device]] = None,
    ):
        """
        `__init__(...)`: Initialize the `SyncVectorEnv`.

        Args:
            env_makers: An iterable object which stores functions that make
                the sub-environments to be managed by this `SyncVectorEnv`.
                The number of functions within this iterable object
                determines the number of sub-environments that will be
                managed.
            empty_info: Whether or not to ignore the actual `info` dictionaries
                of the sub-environments and report empty `info` dictionaries
                instead. The default is False. Set this as True if you are not
                interested in additional `info`s, and if you wish to save some
                computational cycles by not merging the separate `info`
                dictionaries into a single dictionary.
            num_episodes: Optionally an integer which represents the number
                of episodes one wishes to run for each sub-environment.
                If this `num_episodes` is given as a positive integer `n`,
                each sub-environment will be subject to auto-reset `n-1` times.
                After its number of environments is run out, a sub-environment
                will keep reporting that it is both terminated and truncated,
                its observations will consist of dummy values (`nan` for
                `float`-typed observations, 0 for `int`-typed observations),
                and its rewards will be `nan`. The internal episode counter
                for the sub-environments will be reset when the `reset()`
                method of `SyncVectorEnv` is called.
                If `num_episodes` is left as None, auto-reset behavior will
                be enabled indefinitely.
            device: Optionally the device on which the observations, rewards,
                terminated and truncated booleans and info arrays will be
                reported. Please note that the sub-environments are always
                expected with a numpy interface. This argument is used only for
                optionally converting the sub-environments' state arrays to
                PyTorch tensors on the target device. If this is left as None,
                the reported arrays will be numpy arrays. If this is given as a
                string or as a `torch.device`, the reported arrays will be
                PyTorch tensors on the specified device.
        """
        self.__envs: Sequence[gym.Env] = [env_maker() for env_maker in env_makers]
        num_envs = len(self.__envs)
        if num_envs == 0:
            raise ValueError(
                "At least one sub-environment was expected, but got an empty collection of sub-environments."
            )

        self.__empty_info = bool(empty_info)
        self.__device = device

        single_observation_space = None
        single_action_space = None
        for i_env, env in enumerate(self.__envs):
            if i_env == 0:
                single_observation_space = env.observation_space
                if not isinstance(single_observation_space, Box):
                    raise TypeError(
                        f"Expected a Box-typed observation space, but encountered {single_observation_space}."
                    )
                single_action_space = env.action_space
                _must_be_supported_space(single_action_space)
            else:
                if env.observation_space.shape != single_observation_space.shape:
                    raise ValueError("The observation shapes of the sub-environments do not match")
                if isinstance(env.action_space, Discrete):
                    if not isinstance(single_action_space, Discrete):
                        raise TypeError("The action space types of the sub-environments do not match")
                    if env.action_space.n != single_action_space.n:
                        raise ValueError("The discrete numbers of actions of the sub-environments do not match")
                elif isinstance(env.action_space, Box):
                    if not isinstance(single_action_space, Box):
                        raise TypeError("The action space types of the sub-environments do not match")
                    if env.observation_space.shape != single_observation_space.shape:
                        raise ValueError("The action space shapes of the sub-environments do not match")
                else:
                    assert False, "Code execution should not have reached here. This is most probably a bug."

        self.__batched_obs_shape = (num_envs,) + single_observation_space.shape
        self.__batched_obs_dtype = single_observation_space.dtype
        self.__random_state: Optional[np.random.RandomState] = None

        if num_episodes is None:
            self.__num_episodes = None
            self.__num_episodes_counter = None
            self.__dummy_observation = None
        else:
            self.__num_episodes = int(num_episodes)
            if self.__num_episodes <= 0:
                raise ValueError(f"Expected `num_episodes` as a positive integer, but its value is {num_episodes}")
            self.__dummy_observation = np.zeros(single_observation_space.shape, dtype=single_observation_space.dtype)
            if "float" in str(self.__dummy_observation.dtype):
                self.__dummy_observation[:] = float("nan")
            self.__num_episodes_counter = np.ones(num_envs, dtype=int)

        super().__init__(num_envs, single_observation_space, single_action_space)

    def __pop_seed_kwargs(self) -> list:
        if self.__random_state is None:
            return [{} for _ in range(self.num_envs)]
        else:
            seeds = self.__random_state.randint(0, 2**32, self.num_envs)
            result = [{"seed": int(seed_integer)} for seed_integer in seeds]
            self.__random_state = None
            return result

    def __move_to_target_device(
        self,
        data: Union[np.ndarray, torch.Tensor, dict],
    ) -> Union[np.ndarray, torch.Tensor, dict]:
        from numbers import Real

        if self.__device is None:
            return data

        def move(x: object) -> object:
            if isinstance(x, (Real, bool, np.bool_, torch.Tensor, np.ndarray)):
                return torch.as_tensor(x, device=self.__device)
            else:
                return x

        if isinstance(data, dict):
            return {k: move(v) for k, v in data.items()}
        else:
            return move(data)

    def __move_each_to_target_device(self, *args) -> tuple:
        return tuple(self.__move_to_target_device(x) for x in args)

    def seed(self, seed_integer: Optional[int] = None):
        """
        Prepare an internal random number generator to be used by the next `reset()`.

        In more details, if an integer is given via the argument `seed_integer`,
        an internal random number generator (of type `numpy.random.RandomState`)
        will be instantiated with `seed_integer` as its seed. Then, the next time
        `reset()` is called, each sub-environment will be given a sub-seed, each
        sub-seed being a new integer generated from this internal random number
        generator. Once this operation is complete, the internal random generator
        is destroyed, so that the remaining reset operations will continue to
        be randomized according to the sub-environment-specific generators.

        On the other hand, if the argument `seed_integer` is given as `None`,
        the internal random number generator will be destroyed, meaning that the
        next call to `reset()` will reset each sub-environment without specifying
        any sub-seed at all.

        As an alternative, one can also provide a seed as a positional argument
        to `reset()`. The following two usages are equivalent:

        ```python
        vec_env = SyncVectorEnv(
            [function_to_make_a_single_env() for _ in range(number_of_sub_envs)]
        )

        # Usage 1 (calling seed and reset separately):
        vec_env.seed(an_integer)
        vec_env.reset()

        # Usage 2 (calling reset with a seed argument):
        vec_env.reset(seed=an_integer)
        ```

        Args:
            seed_integer: An integer if you wish each sub-environment to be
                randomized via a pseudo-random generator seeded by this given
                integer. Otherwise, this can be left as None.
        """
        if seed_integer is None:
            self.__random_state = None
        else:
            self.__random_state = np.random.RandomState(seed_integer)

    def reset(self, **kwargs) -> tuple:
        """
        Reset each sub-environment.

        Any keyword argument other than `seed` will be sent directly to the
        `reset(...)` methods of the underlying sub-environments.

        If, among the keyword arguments, there is `seed`, the value for this
        `seed` keyword argument will be expected either as None, or as an integer.
        The setting `seed=None` can be used if the user wishes to ensure that
        there will be no explicit seeding when resetting the sub-environments
        (even when the `seed(...)` method of `SyncVectorEnv` was called
        previously with an explicit seed integer).
        The setting `seed=S`, where `S` is an integer, causes the following
        steps to be executed:
        (i) prepare a temporary random number generator with seed `S`;
        (ii) from the temporary random number generator, generate `N` sub-seed
        integers where `N` is the number of sub-environments;
        (iii) reset each sub-environment with a sub-seed;
        (iv) destroy the temporary random number generator.

        Args:
            kwargs: Keyword arguments to be passed to the `reset()` methods
                of the underlying sub-environments. The keyword `seed` will be
                intercepted and treated specially.
        Returns:
            A tuple of the form `(observation, info)`, where `observation` is
            a numpy array storing the observations of all the sub-environments
            (where the leftmost dimension is the batch dimension), and `info`
            is the `info` dictionary. If possible, the values within the
            `info` dictionary will be combined to single numpy arrays as well.
            If this `SyncVectorEnv` was initialized with a `device`, the
            results will be in the form of PyTorch tensors on the specified device.
        """
        if "seed" in kwargs:
            self.seed(kwargs["seed"])
            remaining_kwargs = {k: v for k, v in kwargs.items() if k != "seed"}
        else:
            remaining_kwargs = kwargs

        if self.__num_episodes is not None:
            self.__num_episodes_counter[:] = self.__num_episodes

        seed_kwargs_list = self.__pop_seed_kwargs()
        observations = []
        infos = []
        for env, seed_kwargs in zip(self.__envs, seed_kwargs_list):
            observation, info = env.reset(**seed_kwargs, **remaining_kwargs)
            observations.append(observation)
            if not self.__empty_info:
                infos.append(info)

        if self.__empty_info:
            batched_info = {}
        else:
            batched_info = _batch_info_dicts(infos)

        return self.__move_each_to_target_device(np.stack(observations), batched_info)

    def step(self, action: Union[torch.Tensor, np.ndarray]) -> tuple:  # noqa: C901
        """
        Take a step within each sub-environment.

        Args:
            action: A numpy array or a PyTorch tensor that contains the action.
                The size of the leftmost dimension of this array or tensor
                is expected to be equal to the number of sub-environments.
        Returns:
            A tuple of the form (`observation`, `reward`, `terminated`,
            `truncated`, `info`) where `observation` is an array or tensor
            storing the observations of the sub-environments, `reward`
            is an array or tensor storing the rewards, `terminated` is an
            array or tensor of booleans stating whether or not the
            sub-environments got reset because of termination,
            `truncated` is an array or tensor of booleans stating whether or
            not the sub-environments got reset because of truncation, and
            `info` is a dictionary storing any additional information
            regarding the states of the sub-environments.
            If this `SyncVectorEnv` was initialized with a `device`, the
            results will be in the form of PyTorch tensors on the specified
            device.
        """
        if isinstance(action, torch.Tensor):
            action = action.cpu().numpy()
        else:
            action = np.asarray(action)

        if action.ndim == 0:
            raise ValueError("The action array must be at least 1-dimensional")

        batch_size = action.shape[0]
        if batch_size != self.num_envs:
            raise ValueError("The leftmost dimension of the action array does not match the number of sub-environments")

        batched_obs_shape = self.__batched_obs_shape
        batched_obs_dtype = self.__batched_obs_dtype
        num_envs = self.num_envs

        if self.__empty_info:
            initialized_info = {}
        else:
            initialized_info = [None for _ in range(num_envs)]

        class per_env:
            observation = np.zeros(batched_obs_shape, dtype=batched_obs_dtype)
            reward = np.zeros(num_envs, dtype=float)
            terminated = np.zeros(num_envs, dtype=bool)
            truncated = np.zeros(num_envs, dtype=bool)
            info = initialized_info

        def is_active_env(env_index: int) -> bool:
            if self.__num_episodes is None:
                return True
            return self.__num_episodes_counter[env_index] > 0

        def is_last_episode(env_index: int) -> bool:
            if self.__num_episodes is None:
                return False
            return self.__num_episodes_counter[env_index] == 1

        def decrement_episode_counter(env_index: int):
            if self.__num_episodes is None:
                return
            self.__num_episodes_counter[env_index] -= 1

        def apply_step(env_index: int, single_action: Union[np.ndarray, np.generic, Number, bool]) -> tuple:
            if not is_active_env(env_index):
                return self.__dummy_observation, float("nan"), True, True, {}

            env = self.__envs[env_index]

            observation, reward, terminated, truncated, info = env.step(single_action)

            if terminated or truncated:
                was_last_episode = is_last_episode(env_index)
                decrement_episode_counter(env_index)
                obs_after_reset, info_after_reset = env.reset()
                if not was_last_episode:
                    observation = obs_after_reset
                    info = info_after_reset

            return observation, reward, terminated, truncated, info

        for i_env in range(len(self.__envs)):
            # observation, reward, terminated, truncated, info = self.__envs[i_env].step(action[i_env])
            # done = terminated | truncated
            # if done:
            #     observation, info = self.__envs[i_env].reset()
            observation, reward, terminated, truncated, info = apply_step(i_env, action[i_env])

            per_env.observation[i_env] = observation
            per_env.reward[i_env] = reward
            per_env.terminated[i_env] = terminated
            per_env.truncated[i_env] = truncated
            if not self.__empty_info:
                per_env.info[i_env] = info

        if not self.__empty_info:
            per_env.info = _batch_info_dicts(per_env.info)

        return self.__move_each_to_target_device(
            per_env.observation,
            per_env.reward,
            per_env.terminated,
            per_env.truncated,
            per_env.info,
        )

    def render(self, *args, **kwargs):
        """
        Does not do anything, ignores its arguments, and returns None.
        """
        pass

    def close(self):
        """
        Close each sub-environment.
        """
        for env in self.__envs:
            env.close()

__init__(env_makers, *, empty_info=False, num_episodes=None, device=None)

__init__(...): Initialize the SyncVectorEnv.

Parameters:

Name Type Description Default
env_makers Iterable[Env]

An iterable object which stores functions that make the sub-environments to be managed by this SyncVectorEnv. The number of functions within this iterable object determines the number of sub-environments that will be managed.

required
empty_info bool

Whether or not to ignore the actual info dictionaries of the sub-environments and report empty info dictionaries instead. The default is False. Set this as True if you are not interested in additional infos, and if you wish to save some computational cycles by not merging the separate info dictionaries into a single dictionary.

False
num_episodes Optional[int]

Optionally an integer which represents the number of episodes one wishes to run for each sub-environment. If this num_episodes is given as a positive integer n, each sub-environment will be subject to auto-reset n-1 times. After its number of environments is run out, a sub-environment will keep reporting that it is both terminated and truncated, its observations will consist of dummy values (nan for float-typed observations, 0 for int-typed observations), and its rewards will be nan. The internal episode counter for the sub-environments will be reset when the reset() method of SyncVectorEnv is called. If num_episodes is left as None, auto-reset behavior will be enabled indefinitely.

None
device Optional[Union[str, device]]

Optionally the device on which the observations, rewards, terminated and truncated booleans and info arrays will be reported. Please note that the sub-environments are always expected with a numpy interface. This argument is used only for optionally converting the sub-environments' state arrays to PyTorch tensors on the target device. If this is left as None, the reported arrays will be numpy arrays. If this is given as a string or as a torch.device, the reported arrays will be PyTorch tensors on the specified device.

None
Source code in evotorch/neuroevolution/net/vecrl.py
def __init__(
    self,
    env_makers: Iterable[gym.Env],
    *,
    empty_info: bool = False,
    num_episodes: Optional[int] = None,
    device: Optional[Union[str, torch.device]] = None,
):
    """
    `__init__(...)`: Initialize the `SyncVectorEnv`.

    Args:
        env_makers: An iterable object which stores functions that make
            the sub-environments to be managed by this `SyncVectorEnv`.
            The number of functions within this iterable object
            determines the number of sub-environments that will be
            managed.
        empty_info: Whether or not to ignore the actual `info` dictionaries
            of the sub-environments and report empty `info` dictionaries
            instead. The default is False. Set this as True if you are not
            interested in additional `info`s, and if you wish to save some
            computational cycles by not merging the separate `info`
            dictionaries into a single dictionary.
        num_episodes: Optionally an integer which represents the number
            of episodes one wishes to run for each sub-environment.
            If this `num_episodes` is given as a positive integer `n`,
            each sub-environment will be subject to auto-reset `n-1` times.
            After its number of environments is run out, a sub-environment
            will keep reporting that it is both terminated and truncated,
            its observations will consist of dummy values (`nan` for
            `float`-typed observations, 0 for `int`-typed observations),
            and its rewards will be `nan`. The internal episode counter
            for the sub-environments will be reset when the `reset()`
            method of `SyncVectorEnv` is called.
            If `num_episodes` is left as None, auto-reset behavior will
            be enabled indefinitely.
        device: Optionally the device on which the observations, rewards,
            terminated and truncated booleans and info arrays will be
            reported. Please note that the sub-environments are always
            expected with a numpy interface. This argument is used only for
            optionally converting the sub-environments' state arrays to
            PyTorch tensors on the target device. If this is left as None,
            the reported arrays will be numpy arrays. If this is given as a
            string or as a `torch.device`, the reported arrays will be
            PyTorch tensors on the specified device.
    """
    self.__envs: Sequence[gym.Env] = [env_maker() for env_maker in env_makers]
    num_envs = len(self.__envs)
    if num_envs == 0:
        raise ValueError(
            "At least one sub-environment was expected, but got an empty collection of sub-environments."
        )

    self.__empty_info = bool(empty_info)
    self.__device = device

    single_observation_space = None
    single_action_space = None
    for i_env, env in enumerate(self.__envs):
        if i_env == 0:
            single_observation_space = env.observation_space
            if not isinstance(single_observation_space, Box):
                raise TypeError(
                    f"Expected a Box-typed observation space, but encountered {single_observation_space}."
                )
            single_action_space = env.action_space
            _must_be_supported_space(single_action_space)
        else:
            if env.observation_space.shape != single_observation_space.shape:
                raise ValueError("The observation shapes of the sub-environments do not match")
            if isinstance(env.action_space, Discrete):
                if not isinstance(single_action_space, Discrete):
                    raise TypeError("The action space types of the sub-environments do not match")
                if env.action_space.n != single_action_space.n:
                    raise ValueError("The discrete numbers of actions of the sub-environments do not match")
            elif isinstance(env.action_space, Box):
                if not isinstance(single_action_space, Box):
                    raise TypeError("The action space types of the sub-environments do not match")
                if env.observation_space.shape != single_observation_space.shape:
                    raise ValueError("The action space shapes of the sub-environments do not match")
            else:
                assert False, "Code execution should not have reached here. This is most probably a bug."

    self.__batched_obs_shape = (num_envs,) + single_observation_space.shape
    self.__batched_obs_dtype = single_observation_space.dtype
    self.__random_state: Optional[np.random.RandomState] = None

    if num_episodes is None:
        self.__num_episodes = None
        self.__num_episodes_counter = None
        self.__dummy_observation = None
    else:
        self.__num_episodes = int(num_episodes)
        if self.__num_episodes <= 0:
            raise ValueError(f"Expected `num_episodes` as a positive integer, but its value is {num_episodes}")
        self.__dummy_observation = np.zeros(single_observation_space.shape, dtype=single_observation_space.dtype)
        if "float" in str(self.__dummy_observation.dtype):
            self.__dummy_observation[:] = float("nan")
        self.__num_episodes_counter = np.ones(num_envs, dtype=int)

    super().__init__(num_envs, single_observation_space, single_action_space)

close()

Close each sub-environment.

Source code in evotorch/neuroevolution/net/vecrl.py
def close(self):
    """
    Close each sub-environment.
    """
    for env in self.__envs:
        env.close()

render(*args, **kwargs)

Does not do anything, ignores its arguments, and returns None.

Source code in evotorch/neuroevolution/net/vecrl.py
def render(self, *args, **kwargs):
    """
    Does not do anything, ignores its arguments, and returns None.
    """
    pass

reset(**kwargs)

Reset each sub-environment.

Any keyword argument other than seed will be sent directly to the reset(...) methods of the underlying sub-environments.

If, among the keyword arguments, there is seed, the value for this seed keyword argument will be expected either as None, or as an integer. The setting seed=None can be used if the user wishes to ensure that there will be no explicit seeding when resetting the sub-environments (even when the seed(...) method of SyncVectorEnv was called previously with an explicit seed integer). The setting seed=S, where S is an integer, causes the following steps to be executed: (i) prepare a temporary random number generator with seed S; (ii) from the temporary random number generator, generate N sub-seed integers where N is the number of sub-environments; (iii) reset each sub-environment with a sub-seed; (iv) destroy the temporary random number generator.

Parameters:

Name Type Description Default
kwargs

Keyword arguments to be passed to the reset() methods of the underlying sub-environments. The keyword seed will be intercepted and treated specially.

{}
Source code in evotorch/neuroevolution/net/vecrl.py
def reset(self, **kwargs) -> tuple:
    """
    Reset each sub-environment.

    Any keyword argument other than `seed` will be sent directly to the
    `reset(...)` methods of the underlying sub-environments.

    If, among the keyword arguments, there is `seed`, the value for this
    `seed` keyword argument will be expected either as None, or as an integer.
    The setting `seed=None` can be used if the user wishes to ensure that
    there will be no explicit seeding when resetting the sub-environments
    (even when the `seed(...)` method of `SyncVectorEnv` was called
    previously with an explicit seed integer).
    The setting `seed=S`, where `S` is an integer, causes the following
    steps to be executed:
    (i) prepare a temporary random number generator with seed `S`;
    (ii) from the temporary random number generator, generate `N` sub-seed
    integers where `N` is the number of sub-environments;
    (iii) reset each sub-environment with a sub-seed;
    (iv) destroy the temporary random number generator.

    Args:
        kwargs: Keyword arguments to be passed to the `reset()` methods
            of the underlying sub-environments. The keyword `seed` will be
            intercepted and treated specially.
    Returns:
        A tuple of the form `(observation, info)`, where `observation` is
        a numpy array storing the observations of all the sub-environments
        (where the leftmost dimension is the batch dimension), and `info`
        is the `info` dictionary. If possible, the values within the
        `info` dictionary will be combined to single numpy arrays as well.
        If this `SyncVectorEnv` was initialized with a `device`, the
        results will be in the form of PyTorch tensors on the specified device.
    """
    if "seed" in kwargs:
        self.seed(kwargs["seed"])
        remaining_kwargs = {k: v for k, v in kwargs.items() if k != "seed"}
    else:
        remaining_kwargs = kwargs

    if self.__num_episodes is not None:
        self.__num_episodes_counter[:] = self.__num_episodes

    seed_kwargs_list = self.__pop_seed_kwargs()
    observations = []
    infos = []
    for env, seed_kwargs in zip(self.__envs, seed_kwargs_list):
        observation, info = env.reset(**seed_kwargs, **remaining_kwargs)
        observations.append(observation)
        if not self.__empty_info:
            infos.append(info)

    if self.__empty_info:
        batched_info = {}
    else:
        batched_info = _batch_info_dicts(infos)

    return self.__move_each_to_target_device(np.stack(observations), batched_info)

seed(seed_integer=None)

Prepare an internal random number generator to be used by the next reset().

In more details, if an integer is given via the argument seed_integer, an internal random number generator (of type numpy.random.RandomState) will be instantiated with seed_integer as its seed. Then, the next time reset() is called, each sub-environment will be given a sub-seed, each sub-seed being a new integer generated from this internal random number generator. Once this operation is complete, the internal random generator is destroyed, so that the remaining reset operations will continue to be randomized according to the sub-environment-specific generators.

On the other hand, if the argument seed_integer is given as None, the internal random number generator will be destroyed, meaning that the next call to reset() will reset each sub-environment without specifying any sub-seed at all.

As an alternative, one can also provide a seed as a positional argument to reset(). The following two usages are equivalent:

vec_env = SyncVectorEnv(
    [function_to_make_a_single_env() for _ in range(number_of_sub_envs)]
)

# Usage 1 (calling seed and reset separately):
vec_env.seed(an_integer)
vec_env.reset()

# Usage 2 (calling reset with a seed argument):
vec_env.reset(seed=an_integer)

Parameters:

Name Type Description Default
seed_integer Optional[int]

An integer if you wish each sub-environment to be randomized via a pseudo-random generator seeded by this given integer. Otherwise, this can be left as None.

None
Source code in evotorch/neuroevolution/net/vecrl.py
def seed(self, seed_integer: Optional[int] = None):
    """
    Prepare an internal random number generator to be used by the next `reset()`.

    In more details, if an integer is given via the argument `seed_integer`,
    an internal random number generator (of type `numpy.random.RandomState`)
    will be instantiated with `seed_integer` as its seed. Then, the next time
    `reset()` is called, each sub-environment will be given a sub-seed, each
    sub-seed being a new integer generated from this internal random number
    generator. Once this operation is complete, the internal random generator
    is destroyed, so that the remaining reset operations will continue to
    be randomized according to the sub-environment-specific generators.

    On the other hand, if the argument `seed_integer` is given as `None`,
    the internal random number generator will be destroyed, meaning that the
    next call to `reset()` will reset each sub-environment without specifying
    any sub-seed at all.

    As an alternative, one can also provide a seed as a positional argument
    to `reset()`. The following two usages are equivalent:

    ```python
    vec_env = SyncVectorEnv(
        [function_to_make_a_single_env() for _ in range(number_of_sub_envs)]
    )

    # Usage 1 (calling seed and reset separately):
    vec_env.seed(an_integer)
    vec_env.reset()

    # Usage 2 (calling reset with a seed argument):
    vec_env.reset(seed=an_integer)
    ```

    Args:
        seed_integer: An integer if you wish each sub-environment to be
            randomized via a pseudo-random generator seeded by this given
            integer. Otherwise, this can be left as None.
    """
    if seed_integer is None:
        self.__random_state = None
    else:
        self.__random_state = np.random.RandomState(seed_integer)

step(action)

Take a step within each sub-environment.

Parameters:

Name Type Description Default
action Union[Tensor, ndarray]

A numpy array or a PyTorch tensor that contains the action. The size of the leftmost dimension of this array or tensor is expected to be equal to the number of sub-environments.

required
Source code in evotorch/neuroevolution/net/vecrl.py
def step(self, action: Union[torch.Tensor, np.ndarray]) -> tuple:  # noqa: C901
    """
    Take a step within each sub-environment.

    Args:
        action: A numpy array or a PyTorch tensor that contains the action.
            The size of the leftmost dimension of this array or tensor
            is expected to be equal to the number of sub-environments.
    Returns:
        A tuple of the form (`observation`, `reward`, `terminated`,
        `truncated`, `info`) where `observation` is an array or tensor
        storing the observations of the sub-environments, `reward`
        is an array or tensor storing the rewards, `terminated` is an
        array or tensor of booleans stating whether or not the
        sub-environments got reset because of termination,
        `truncated` is an array or tensor of booleans stating whether or
        not the sub-environments got reset because of truncation, and
        `info` is a dictionary storing any additional information
        regarding the states of the sub-environments.
        If this `SyncVectorEnv` was initialized with a `device`, the
        results will be in the form of PyTorch tensors on the specified
        device.
    """
    if isinstance(action, torch.Tensor):
        action = action.cpu().numpy()
    else:
        action = np.asarray(action)

    if action.ndim == 0:
        raise ValueError("The action array must be at least 1-dimensional")

    batch_size = action.shape[0]
    if batch_size != self.num_envs:
        raise ValueError("The leftmost dimension of the action array does not match the number of sub-environments")

    batched_obs_shape = self.__batched_obs_shape
    batched_obs_dtype = self.__batched_obs_dtype
    num_envs = self.num_envs

    if self.__empty_info:
        initialized_info = {}
    else:
        initialized_info = [None for _ in range(num_envs)]

    class per_env:
        observation = np.zeros(batched_obs_shape, dtype=batched_obs_dtype)
        reward = np.zeros(num_envs, dtype=float)
        terminated = np.zeros(num_envs, dtype=bool)
        truncated = np.zeros(num_envs, dtype=bool)
        info = initialized_info

    def is_active_env(env_index: int) -> bool:
        if self.__num_episodes is None:
            return True
        return self.__num_episodes_counter[env_index] > 0

    def is_last_episode(env_index: int) -> bool:
        if self.__num_episodes is None:
            return False
        return self.__num_episodes_counter[env_index] == 1

    def decrement_episode_counter(env_index: int):
        if self.__num_episodes is None:
            return
        self.__num_episodes_counter[env_index] -= 1

    def apply_step(env_index: int, single_action: Union[np.ndarray, np.generic, Number, bool]) -> tuple:
        if not is_active_env(env_index):
            return self.__dummy_observation, float("nan"), True, True, {}

        env = self.__envs[env_index]

        observation, reward, terminated, truncated, info = env.step(single_action)

        if terminated or truncated:
            was_last_episode = is_last_episode(env_index)
            decrement_episode_counter(env_index)
            obs_after_reset, info_after_reset = env.reset()
            if not was_last_episode:
                observation = obs_after_reset
                info = info_after_reset

        return observation, reward, terminated, truncated, info

    for i_env in range(len(self.__envs)):
        # observation, reward, terminated, truncated, info = self.__envs[i_env].step(action[i_env])
        # done = terminated | truncated
        # if done:
        #     observation, info = self.__envs[i_env].reset()
        observation, reward, terminated, truncated, info = apply_step(i_env, action[i_env])

        per_env.observation[i_env] = observation
        per_env.reward[i_env] = reward
        per_env.terminated[i_env] = terminated
        per_env.truncated[i_env] = truncated
        if not self.__empty_info:
            per_env.info[i_env] = info

    if not self.__empty_info:
        per_env.info = _batch_info_dicts(per_env.info)

    return self.__move_each_to_target_device(
        per_env.observation,
        per_env.reward,
        per_env.terminated,
        per_env.truncated,
        per_env.info,
    )

TorchWrapper

A wrapper for vectorized or non-vectorized gymnasium environments.

This wrapper ensures that the actions, observations, rewards, and the 'done' values are expressed as PyTorch tensors.

Please note that TorchWrapper does not inherit neither from gymnasium.Wrapper, nor from gymnasium.vector.VectorEnvWrapper. Once an environment is wrapped via TorchWrapper, it is NOT recommended to further wrap it via other types of wrappers.

Source code in evotorch/neuroevolution/net/vecrl.py
class TorchWrapper:
    """
    A wrapper for vectorized or non-vectorized gymnasium environments.

    This wrapper ensures that the actions, observations, rewards, and
    the 'done' values are expressed as PyTorch tensors.

    Please note that `TorchWrapper` does not inherit neither from
    `gymnasium.Wrapper`, nor from `gymnasium.vector.VectorEnvWrapper`.
    Once an environment is wrapped via `TorchWrapper`, it is NOT
    recommended to further wrap it via other types of wrappers.
    """

    def __init__(
        self,
        env: Union[gym.Env, gym.vector.VectorEnv, "TorchWrapper"],
        *,
        force_classic_api: bool = False,
        discrete_to_continuous_act: bool = False,
        clip_actions: bool = False,
        # **kwargs,
    ):
        """
        `__init__(...)`: Initialize the TorchWrapper.

        Args:
            env: The gymnasium environment to be wrapped.
            force_classic_api: Set this as True if you would like to enable
                the classic API. In the classic API, the `reset(...)` method
                returns only the observation and the `step(...)` method
                returns 4 elements (not 5).
            discrete_to_continuous_act: When this is set as True and the
                wrapped environment has a Discrete action space, this wrapper
                will transform the action space to Box. A Discrete-action
                environment with `n` actions will be converted to a Box-action
                environment where the action length is `n`.
                The index of the largest value within the action vector will
                be applied to the underlying environment.
            clip_actions: Set this as True if you would like to clip the given
                actions so that they conform to the declared boundaries of the
                action space.
        """
        # super().__init__(env, **kwargs)
        self.env = env
        self.observation_space = env.observation_space
        self.action_space = env.action_space

        # Declare the variable that will store the array type of the underlying environment.
        self.__array_type: Optional[str] = None

        if hasattr(env.unwrapped, "single_observation_space"):
            # If the underlying environment has the attribute "single_observation_space",
            # then this is a vectorized environment.
            self.__vectorized = True

            # Get the observation and action spaces.
            obs_space = _unbatch_space(env.observation_space)
            act_space = _unbatch_space(env.action_space)
            self.single_observation_space = obs_space
            self.single_action_space = act_space
            self.num_envs = env.unwrapped.num_envs
        else:
            # If the underlying environment has the attribute "single_observation_space",
            # then this is a non-vectorized environment.
            self.__vectorized = False

            # Get the observation and action spaces.
            obs_space = env.observation_space
            act_space = env.action_space

        # Ensure that the observation and action spaces are supported.
        _must_be_supported_space(obs_space)
        _must_be_supported_space(act_space)

        # Store the choice of the user regarding "force_classic_api".
        self.__force_classic_api = bool(force_classic_api)

        if isinstance(act_space, Discrete) and discrete_to_continuous_act:
            # The underlying action space is Discrete and `discrete_to_continuous_act` is given as True.
            # Therefore, we convert the action space to continuous (to Box).

            # Take the shape and the dtype of the discrete action space.
            single_action_shape = (act_space.n,)
            single_action_dtype = torch.from_numpy(np.array([], dtype=act_space.dtype)).dtype

            # We store the integer dtype of the environment.
            self.__discrete_dtype = single_action_dtype

            if self.__vectorized:
                # If the environment is vectorized, we declare the new `action_space` and the `single_action_space`
                # for the enviornment.
                action_shape = (env.num_envs,) + single_action_shape
                self.single_action_space = Box(float("-inf"), float("inf"), shape=single_action_shape, dtype=np.float32)
                self.action_space = Box(float("-inf"), float("inf"), shape=action_shape, dtype=np.float32)
            else:
                # If the environment is not vectorized, we declare the new `action_space` for the environment.
                self.action_space = Box(float("-inf"), float("inf"), shape=single_action_shape, dtype=np.float32)
        else:
            # This is the case where we do not transform the action space.
            # The discrete dtype will not be used, so, we set it as None.
            self.__discrete_dtype = None

        if isinstance(act_space, Box) and clip_actions:
            # If the action space is Box and the wrapper is configured to clip the actions, then we store the lower
            # and the upper bounds for the actions.
            self.__act_lb = torch.from_numpy(act_space.low)
            self.__act_ub = torch.from_numpy(act_space.high)
        else:
            # If there will not be any action clipping, then we store the lower and the upper bounds as None.
            self.__act_lb = None
            self.__act_ub = None

    @property
    def array_type(self) -> Optional[str]:
        """
        Get the array type of the wrapped environment.
        This can be "jax", "torch", or "numpy".
        """
        return self.__array_type

    def __infer_array_type(self, observation):
        if self.__array_type is None:
            # If the array type is not determined yet, set it as the array type of the received observation.
            # If the observation has an unrecognized type, set the array type as "numpy".
            self.__array_type = array_type(observation, "numpy")

    def reset(self, *args, **kwargs):
        """Reset the environment"""

        # Call the reset method of the wrapped environment.
        reset_result = self.env.reset(*args, **kwargs)

        if isinstance(reset_result, tuple):
            # If we received a tuple of two elements, then we assume that this is the new gym API.
            # We note that we received an info dictionary.
            got_info = True
            # We keep the received observation and info.
            observation, info = reset_result
        else:
            # If we did not receive a tuple, then we assume that this is the old gym API.
            # We note that we did not receive an info dictionary.
            got_info = False
            # We keep the received observation.
            observation = reset_result
            # We did not receive an info dictionary, so, we set it as an empty dictionary.
            info = {}

        # We understand the array type of the underlying environment from the first observation.
        self.__infer_array_type(observation)

        # Convert the observation to a PyTorch tensor.
        observation = convert_to_torch(observation)

        if self.__force_classic_api:
            # If the option `force_classic_api` was set as True, then we only return the observation.
            return observation
        else:
            # Here we handle the case where `force_classic_api` was set as False.
            if got_info:
                # If we got an additional info dictionary, we return it next to the observation.
                return observation, info
            else:
                # If we did not get any info dictionary, we return only the observation.
                return observation

    def step(self, action, *args, **kwargs):
        """Take a step in the environment"""

        if self.__array_type is None:
            # If the array type is not known yet, then probably `reset()` has not been called yet.
            # We raise an error.
            raise ValueError(
                "Could not understand what type of array this environment works with."
                " Perhaps the `reset()` method has not been called yet?"
            )

        if self.__discrete_dtype is not None:
            # If the wrapped environment is discrete-actioned, then we take the integer counterpart of the action.
            action = torch.argmax(action, dim=-1).to(dtype=self.__discrete_dtype)

        if self.__act_lb is not None:
            # The internal variable `__act_lb` having a value other than None means that the initialization argument
            # `clip_actions` was given as True.
            # Therefore, we clip the actions.
            self.__act_lb = self.__act_lb.to(action.device)
            self.__act_ub = self.__act_ub.to(action.device)
            action = torch.max(action, self.__act_lb)
            action = torch.min(action, self.__act_ub)

        # Convert the action tensor to the expected array type of the underlying environment.
        action = convert_from_torch(action, self.__array_type)

        # Perform the step and get the result.
        result = self.env.step(action, *args, **kwargs)

        if not isinstance(result, tuple):
            # If the `step(...)` method returned anything other than tuple, we raise an error.
            raise TypeError(f"Expected a tuple as the result of the `step()` method, but received a {type(result)}")

        if len(result) == 5:
            # If the result is a tuple of 5 elements, then we note that we are using the new API.
            using_new_api = True
            # Take the observation, reward, two boolean variables done and done2 indicating that the episode(s)
            # has/have ended, and additional info.
            # `done` indicates whether or not the episode(s) reached terminal state(s).
            # `done2` indicates whether or not the episode(s) got truncated because of the timestep limit.
            observation, reward, done, done2, info = result
        elif len(result) == 4:
            # If the result is a tuple of 4 elements, then we note that we are not using the new API.
            using_new_api = False
            # Take the observation, reward, the done boolean flag, and additional info.
            observation, reward, done, info = result
            done2 = None
        else:
            raise ValueError(f"Unexpected number of elements were returned from step(): {len(result)}")

        # Convert the observation, reward, and done variables to PyTorch tensors.
        observation = convert_to_torch(observation)
        reward = convert_to_torch(reward)
        done = convert_to_torch_bool(done)
        if done2 is not None:
            done2 = convert_to_torch_bool(done2)

        if self.__force_classic_api:
            # This is the case where the initialization argument `force_classic_api` was set as True.
            if done2 is not None:
                # We combine the terminal state and truncation signals into a single boolean tensor indicating
                # whether or not the episode(s) ended.
                done = done | done2
            # Return 4 elements, compatible with the classic gym API.
            return observation, reward, done, info
        else:
            # This is the case where the initialization argument `force_classic_api` was set as False.
            if using_new_api:
                # If we are using the new API, then we return the 5-element result.
                return observation, reward, done, done2, info
            else:
                # If we are using the new API, then we return the 4-element result.
                return observation, reward, done, info

    def seed(self, *args, **kwargs) -> Any:
        return self.env.seed(*args, **kwargs)

    def render(self, *args, **kwargs) -> Any:
        return self.env.render(*args, **kwargs)

    def close(self, *args, **kwargs) -> Any:
        return self.env.close(*args, **kwargs)

    @property
    def unwrapped(self) -> Union[gym.Env, gym.vector.VectorEnv]:
        return self.env.unwrapped

array_type property

Get the array type of the wrapped environment. This can be "jax", "torch", or "numpy".

__init__(env, *, force_classic_api=False, discrete_to_continuous_act=False, clip_actions=False)

__init__(...): Initialize the TorchWrapper.

Parameters:

Name Type Description Default
env Union[Env, VectorEnv, TorchWrapper]

The gymnasium environment to be wrapped.

required
force_classic_api bool

Set this as True if you would like to enable the classic API. In the classic API, the reset(...) method returns only the observation and the step(...) method returns 4 elements (not 5).

False
discrete_to_continuous_act bool

When this is set as True and the wrapped environment has a Discrete action space, this wrapper will transform the action space to Box. A Discrete-action environment with n actions will be converted to a Box-action environment where the action length is n. The index of the largest value within the action vector will be applied to the underlying environment.

False
clip_actions bool

Set this as True if you would like to clip the given actions so that they conform to the declared boundaries of the action space.

False
Source code in evotorch/neuroevolution/net/vecrl.py
def __init__(
    self,
    env: Union[gym.Env, gym.vector.VectorEnv, "TorchWrapper"],
    *,
    force_classic_api: bool = False,
    discrete_to_continuous_act: bool = False,
    clip_actions: bool = False,
    # **kwargs,
):
    """
    `__init__(...)`: Initialize the TorchWrapper.

    Args:
        env: The gymnasium environment to be wrapped.
        force_classic_api: Set this as True if you would like to enable
            the classic API. In the classic API, the `reset(...)` method
            returns only the observation and the `step(...)` method
            returns 4 elements (not 5).
        discrete_to_continuous_act: When this is set as True and the
            wrapped environment has a Discrete action space, this wrapper
            will transform the action space to Box. A Discrete-action
            environment with `n` actions will be converted to a Box-action
            environment where the action length is `n`.
            The index of the largest value within the action vector will
            be applied to the underlying environment.
        clip_actions: Set this as True if you would like to clip the given
            actions so that they conform to the declared boundaries of the
            action space.
    """
    # super().__init__(env, **kwargs)
    self.env = env
    self.observation_space = env.observation_space
    self.action_space = env.action_space

    # Declare the variable that will store the array type of the underlying environment.
    self.__array_type: Optional[str] = None

    if hasattr(env.unwrapped, "single_observation_space"):
        # If the underlying environment has the attribute "single_observation_space",
        # then this is a vectorized environment.
        self.__vectorized = True

        # Get the observation and action spaces.
        obs_space = _unbatch_space(env.observation_space)
        act_space = _unbatch_space(env.action_space)
        self.single_observation_space = obs_space
        self.single_action_space = act_space
        self.num_envs = env.unwrapped.num_envs
    else:
        # If the underlying environment has the attribute "single_observation_space",
        # then this is a non-vectorized environment.
        self.__vectorized = False

        # Get the observation and action spaces.
        obs_space = env.observation_space
        act_space = env.action_space

    # Ensure that the observation and action spaces are supported.
    _must_be_supported_space(obs_space)
    _must_be_supported_space(act_space)

    # Store the choice of the user regarding "force_classic_api".
    self.__force_classic_api = bool(force_classic_api)

    if isinstance(act_space, Discrete) and discrete_to_continuous_act:
        # The underlying action space is Discrete and `discrete_to_continuous_act` is given as True.
        # Therefore, we convert the action space to continuous (to Box).

        # Take the shape and the dtype of the discrete action space.
        single_action_shape = (act_space.n,)
        single_action_dtype = torch.from_numpy(np.array([], dtype=act_space.dtype)).dtype

        # We store the integer dtype of the environment.
        self.__discrete_dtype = single_action_dtype

        if self.__vectorized:
            # If the environment is vectorized, we declare the new `action_space` and the `single_action_space`
            # for the enviornment.
            action_shape = (env.num_envs,) + single_action_shape
            self.single_action_space = Box(float("-inf"), float("inf"), shape=single_action_shape, dtype=np.float32)
            self.action_space = Box(float("-inf"), float("inf"), shape=action_shape, dtype=np.float32)
        else:
            # If the environment is not vectorized, we declare the new `action_space` for the environment.
            self.action_space = Box(float("-inf"), float("inf"), shape=single_action_shape, dtype=np.float32)
    else:
        # This is the case where we do not transform the action space.
        # The discrete dtype will not be used, so, we set it as None.
        self.__discrete_dtype = None

    if isinstance(act_space, Box) and clip_actions:
        # If the action space is Box and the wrapper is configured to clip the actions, then we store the lower
        # and the upper bounds for the actions.
        self.__act_lb = torch.from_numpy(act_space.low)
        self.__act_ub = torch.from_numpy(act_space.high)
    else:
        # If there will not be any action clipping, then we store the lower and the upper bounds as None.
        self.__act_lb = None
        self.__act_ub = None

reset(*args, **kwargs)

Reset the environment

Source code in evotorch/neuroevolution/net/vecrl.py
def reset(self, *args, **kwargs):
    """Reset the environment"""

    # Call the reset method of the wrapped environment.
    reset_result = self.env.reset(*args, **kwargs)

    if isinstance(reset_result, tuple):
        # If we received a tuple of two elements, then we assume that this is the new gym API.
        # We note that we received an info dictionary.
        got_info = True
        # We keep the received observation and info.
        observation, info = reset_result
    else:
        # If we did not receive a tuple, then we assume that this is the old gym API.
        # We note that we did not receive an info dictionary.
        got_info = False
        # We keep the received observation.
        observation = reset_result
        # We did not receive an info dictionary, so, we set it as an empty dictionary.
        info = {}

    # We understand the array type of the underlying environment from the first observation.
    self.__infer_array_type(observation)

    # Convert the observation to a PyTorch tensor.
    observation = convert_to_torch(observation)

    if self.__force_classic_api:
        # If the option `force_classic_api` was set as True, then we only return the observation.
        return observation
    else:
        # Here we handle the case where `force_classic_api` was set as False.
        if got_info:
            # If we got an additional info dictionary, we return it next to the observation.
            return observation, info
        else:
            # If we did not get any info dictionary, we return only the observation.
            return observation

step(action, *args, **kwargs)

Take a step in the environment

Source code in evotorch/neuroevolution/net/vecrl.py
def step(self, action, *args, **kwargs):
    """Take a step in the environment"""

    if self.__array_type is None:
        # If the array type is not known yet, then probably `reset()` has not been called yet.
        # We raise an error.
        raise ValueError(
            "Could not understand what type of array this environment works with."
            " Perhaps the `reset()` method has not been called yet?"
        )

    if self.__discrete_dtype is not None:
        # If the wrapped environment is discrete-actioned, then we take the integer counterpart of the action.
        action = torch.argmax(action, dim=-1).to(dtype=self.__discrete_dtype)

    if self.__act_lb is not None:
        # The internal variable `__act_lb` having a value other than None means that the initialization argument
        # `clip_actions` was given as True.
        # Therefore, we clip the actions.
        self.__act_lb = self.__act_lb.to(action.device)
        self.__act_ub = self.__act_ub.to(action.device)
        action = torch.max(action, self.__act_lb)
        action = torch.min(action, self.__act_ub)

    # Convert the action tensor to the expected array type of the underlying environment.
    action = convert_from_torch(action, self.__array_type)

    # Perform the step and get the result.
    result = self.env.step(action, *args, **kwargs)

    if not isinstance(result, tuple):
        # If the `step(...)` method returned anything other than tuple, we raise an error.
        raise TypeError(f"Expected a tuple as the result of the `step()` method, but received a {type(result)}")

    if len(result) == 5:
        # If the result is a tuple of 5 elements, then we note that we are using the new API.
        using_new_api = True
        # Take the observation, reward, two boolean variables done and done2 indicating that the episode(s)
        # has/have ended, and additional info.
        # `done` indicates whether or not the episode(s) reached terminal state(s).
        # `done2` indicates whether or not the episode(s) got truncated because of the timestep limit.
        observation, reward, done, done2, info = result
    elif len(result) == 4:
        # If the result is a tuple of 4 elements, then we note that we are not using the new API.
        using_new_api = False
        # Take the observation, reward, the done boolean flag, and additional info.
        observation, reward, done, info = result
        done2 = None
    else:
        raise ValueError(f"Unexpected number of elements were returned from step(): {len(result)}")

    # Convert the observation, reward, and done variables to PyTorch tensors.
    observation = convert_to_torch(observation)
    reward = convert_to_torch(reward)
    done = convert_to_torch_bool(done)
    if done2 is not None:
        done2 = convert_to_torch_bool(done2)

    if self.__force_classic_api:
        # This is the case where the initialization argument `force_classic_api` was set as True.
        if done2 is not None:
            # We combine the terminal state and truncation signals into a single boolean tensor indicating
            # whether or not the episode(s) ended.
            done = done | done2
        # Return 4 elements, compatible with the classic gym API.
        return observation, reward, done, info
    else:
        # This is the case where the initialization argument `force_classic_api` was set as False.
        if using_new_api:
            # If we are using the new API, then we return the 5-element result.
            return observation, reward, done, done2, info
        else:
            # If we are using the new API, then we return the 4-element result.
            return observation, reward, done, info

array_type(x, fallback=None)

Get the type of an array as a string ("jax", "torch", or "numpy"). If the type of the array cannot be determined and a fallback is provided, then the fallback value will be returned.

Parameters:

Name Type Description Default
x Any

The array whose type will be determined.

required
fallback Optional[str]

Fallback value, as a string, which will be returned if the array type cannot be determined.

None
Source code in evotorch/neuroevolution/net/vecrl.py
def array_type(x: Any, fallback: Optional[str] = None) -> str:
    """
    Get the type of an array as a string ("jax", "torch", or "numpy").
    If the type of the array cannot be determined and a fallback is provided,
    then the fallback value will be returned.

    Args:
        x: The array whose type will be determined.
        fallback: Fallback value, as a string, which will be returned if the
            array type cannot be determined.
    Returns:
        The array type as a string ("jax", "torch", or "numpy").
    Raises:
        TypeError: if the array type cannot be determined and a fallback
            value is not provided.
    """
    if is_jax_array(x):
        return "jax"
    elif isinstance(x, torch.Tensor):
        return "torch"
    elif isinstance(x, np.ndarray):
        return "numpy"
    elif fallback is not None:
        return fallback
    else:
        raise TypeError(f"The object has an unrecognized type: {type(x)}")

convert_from_torch(x, array_type)

Convert the given PyTorch tensor to an array of the specified type.

Parameters:

Name Type Description Default
x Tensor

The PyTorch array that will be converted.

required
array_type str

Type to which the PyTorch tensor will be converted. Expected as one of these strings: "jax", "torch", "numpy".

required
Source code in evotorch/neuroevolution/net/vecrl.py
def convert_from_torch(x: torch.Tensor, array_type: str) -> Any:
    """
    Convert the given PyTorch tensor to an array of the specified type.

    Args:
        x: The PyTorch array that will be converted.
        array_type: Type to which the PyTorch tensor will be converted.
            Expected as one of these strings: "jax", "torch", "numpy".
    Returns:
        The array of the specified type. Can be a JAX array, a numpy array,
        or PyTorch tensor.
    Raises:
        ValueError: if the array type cannot be determined.
    """
    if array_type == "torch":
        return x
    elif array_type == "jax":
        return torch_to_jax(x)
    elif array_type == "numpy":
        return x.cpu().numpy()
    else:
        raise ValueError(f"Unrecognized array type: {array_type}")

convert_to_torch(x)

Convert the given array to PyTorch tensor.

Parameters:

Name Type Description Default
x Any

Array to be converted. Can be a JAX array, a numpy array, a PyTorch tensor (in which case the input tensor will be returned as it is) or any Iterable object.

required
Source code in evotorch/neuroevolution/net/vecrl.py
def convert_to_torch(x: Any) -> torch.Tensor:
    """
    Convert the given array to PyTorch tensor.

    Args:
        x: Array to be converted. Can be a JAX array, a numpy array,
            a PyTorch tensor (in which case the input tensor will be
            returned as it is) or any Iterable object.
    Returns:
        The PyTorch counterpart of the given array.
    """
    if isinstance(x, torch.Tensor):
        return x
    elif is_jax_array(x):
        return jax_to_torch(x)
    elif isinstance(x, np.ndarray):
        return torch.from_numpy(x)
    else:
        return torch.as_tensor(x)

convert_to_torch_bool(x)

Convert the given array to a PyTorch tensor of bools.

If the given object is an array of floating point numbers, then, values that are near to 0.0 (with a tolerance of 1e-4) will be converted to False, and the others will be converted to True. If the given object is an array of integers, then zero values will be converted to False, and non-zero values will be converted to True. If the given object is an array of booleans, then no change will be made to those boolean values.

The given object can be a JAX array, a numpy array, or a PyTorch tensor. The result will always be a PyTorch tensor.

Parameters:

Name Type Description Default
x Any

Array to be converted.

required
Source code in evotorch/neuroevolution/net/vecrl.py
def convert_to_torch_bool(x: Any) -> torch.Tensor:
    """
    Convert the given array to a PyTorch tensor of bools.

    If the given object is an array of floating point numbers, then, values
    that are near to 0.0 (with a tolerance of 1e-4) will be converted to
    False, and the others will be converted to True.
    If the given object is an array of integers, then zero values will be
    converted to False, and non-zero values will be converted to True.
    If the given object is an array of booleans, then no change will be made
    to those boolean values.

    The given object can be a JAX array, a numpy array, or a PyTorch tensor.
    The result will always be a PyTorch tensor.

    Args:
        x: Array to be converted.
    Returns:
        The array converted to a PyTorch tensor with its dtype set as bool.
    """
    x = convert_to_torch(x)
    if x.dtype == torch.bool:
        pass  # nothing to do
    elif "float" in str(x.dtype):
        x = torch.abs(x) > 1e-4
    else:
        x = torch.as_tensor(x, dtype=torch.bool)

    return x

make_brax_env(env_name, *, force_classic_api=False, num_envs=None, discrete_to_continuous_act=False, clip_actions=False, **kwargs)

Make a brax environment and wrap it via TorchWrapper.

Parameters:

Name Type Description Default
env_name str

Name of the brax environment, as string (e.g. "humanoid"). If the string starts with "old::" (e.g. "old::humanoid", etc.), then the environment will be made using the namespace brax.v1 (which was introduced in brax version 0.9.0 where the updated implementations of the environments became default and the classical ones moved into brax.v1). You can use the prefix "old::" for reproducing previous results that were obtained or reported using an older version of brax.

required
force_classic_api bool

Whether or not the classic gym API is to be used.

False
num_envs Optional[int]

Batch size for the vectorized environment.

None
discrete_to_continuous_act bool

Whether or not the the discrete action space of the environment is to be converted to a continuous one. This does nothing if the environment's action space is not discrete.

False
clip_actions bool

Whether or not the actions should be explicitly clipped so that they stay within the declared action boundaries.

False
kwargs

Expected in the form of additional keyword arguments, these are passed to the environment.

{}
Source code in evotorch/neuroevolution/net/vecrl.py
def make_brax_env(
    env_name: str,
    *,
    force_classic_api: bool = False,
    num_envs: Optional[int] = None,
    discrete_to_continuous_act: bool = False,
    clip_actions: bool = False,
    **kwargs,
) -> TorchWrapper:
    """
    Make a brax environment and wrap it via TorchWrapper.

    Args:
        env_name: Name of the brax environment, as string (e.g. "humanoid").
            If the string starts with "old::" (e.g. "old::humanoid", etc.),
            then the environment will be made using the namespace `brax.v1`
            (which was introduced in brax version 0.9.0 where the updated
            implementations of the environments became default and the classical
            ones moved into `brax.v1`).
            You can use the prefix "old::" for reproducing previous results
            that were obtained or reported using an older version of brax.
        force_classic_api: Whether or not the classic gym API is to be used.
        num_envs: Batch size for the vectorized environment.
        discrete_to_continuous_act: Whether or not the the discrete action
            space of the environment is to be converted to a continuous one.
            This does nothing if the environment's action space is not
            discrete.
        clip_actions: Whether or not the actions should be explicitly clipped
            so that they stay within the declared action boundaries.
        kwargs: Expected in the form of additional keyword arguments, these
            are passed to the environment.
    Returns:
        The brax environment, wrapped by TorchWrapper.
    """

    if brax is not None:
        config = {}
        config.update(kwargs)
        if num_envs is not None:
            config["num_envs"] = num_envs
        env = VectorEnvFromBrax(env_name, **config)
        env = TorchWrapper(
            env,
            force_classic_api=force_classic_api,
            discrete_to_continuous_act=discrete_to_continuous_act,
            clip_actions=clip_actions,
        )
        return env
    else:
        _brax_is_missing()

make_gym_env(env_name, *, force_classic_api=False, num_envs=None, discrete_to_continuous_act=False, clip_actions=False, empty_info=False, num_episodes=None, device=None, **kwargs)

Make gymnasium environment(s) and wrap them via a TorchWrapper.

Parameters:

Name Type Description Default
env_name str

Name of the gymnasium environment, as string (e.g. "Humanoid-v4").

required
force_classic_api bool

Whether or not the classic gym API is to be used.

False
num_envs Optional[int]

Optionally a batch size for the vectorized environment. If given as an integer, the environment will be instantiated multiple times, and then wrapped via SyncVectorEnv.

None
discrete_to_continuous_act bool

Whether or not the the discrete action space of the environment is to be converted to a continuous one. This does nothing if the environment's action space is not discrete.

False
clip_actions bool

Whether or not the actions should be explicitly clipped so that they stay within the declared action boundaries.

False
empty_info bool

Whether or not to ignore the info dictionaries of the sub-environments and always return an empty dictionary for the extra info. This feature is only available when num_envs is given as an integer. If num_envs is None, empty_info should be left as False.

False
num_episodes Optional[int]

Optionally an integer which specifies the number of episodes each sub-environment will run for. Until its number of episodes run out, each sub-environment will be subject to auto-reset. Alternatively, num_episodes can be left as None, which means that the sub-environments will be subject to auto-reset indefinitely. Please note that this feature can be used only when num_envs is given as an integer (i.e. when we work with a batch of environments). When num_envs is None, num_episodes is expected as None as well.

None
device Optional[Union[str, device]]

Optionally the device on which the state(s) of the environment(s) will be reported. If None, the reported arrays of the underlying environment(s) will be unchanged. If given as a torch.device or as a string, the reported arrays will be converted to PyTorch tensors and then moved to this specified device. This feature is only available when num_envs is given as an integer. If num_envs is None, device should also be None.

None
kwargs

Expected in the form of additional keyword arguments, these are passed to the environment.

{}
Source code in evotorch/neuroevolution/net/vecrl.py
def make_gym_env(
    env_name: str,
    *,
    force_classic_api: bool = False,
    num_envs: Optional[int] = None,
    discrete_to_continuous_act: bool = False,
    clip_actions: bool = False,
    empty_info: bool = False,
    num_episodes: Optional[int] = None,
    device: Optional[Union[str, torch.device]] = None,
    **kwargs,
) -> TorchWrapper:
    """
    Make gymnasium environment(s) and wrap them via a TorchWrapper.

    Args:
        env_name: Name of the gymnasium environment, as string (e.g. "Humanoid-v4").
        force_classic_api: Whether or not the classic gym API is to be used.
        num_envs: Optionally a batch size for the vectorized environment.
            If given as an integer, the environment will be instantiated multiple
            times, and then wrapped via `SyncVectorEnv`.
        discrete_to_continuous_act: Whether or not the the discrete action
            space of the environment is to be converted to a continuous one.
            This does nothing if the environment's action space is not
            discrete.
        clip_actions: Whether or not the actions should be explicitly clipped
            so that they stay within the declared action boundaries.
        empty_info: Whether or not to ignore the info dictionaries of the
            sub-environments and always return an empty dictionary for the
            extra info. This feature is only available when `num_envs` is given
            as an integer. If `num_envs` is None, `empty_info` should be left as
            False.
        num_episodes: Optionally an integer which specifies the number of
            episodes each sub-environment will run for. Until its number of
            episodes run out, each sub-environment will be subject to
            auto-reset. Alternatively, `num_episodes` can be left as None,
            which means that the sub-environments will be subject to auto-reset
            indefinitely.
            Please note that this feature can be used only when `num_envs` is
            given as an integer (i.e. when we work with a batch of
            environments). When `num_envs` is None, `num_episodes` is expected
            as None as well.
        device: Optionally the device on which the state(s) of the environment(s)
            will be reported. If None, the reported arrays of the underlying
            environment(s) will be unchanged. If given as a `torch.device` or as
            a string, the reported arrays will be converted to PyTorch tensors
            and then moved to this specified device.
            This feature is only available when `num_envs` is given as an
            integer. If `num_envs` is None, `device` should also be None.
        kwargs: Expected in the form of additional keyword arguments, these
            are passed to the environment.
    Returns:
        The gymnasium environments, wrapped by a TorchWrapper.
    """

    def make_the_env():
        return gym.make(env_name, **kwargs)

    if num_envs is None:
        if empty_info:
            raise ValueError(
                f"The argument `empty_info` was received as {repr(empty_info)}."
                " The `empty_info` behavior can be turned on only when `num_envs` is not None."
                " However, `num_envs` was received as None."
            )
        if num_episodes is not None:
            raise ValueError(
                f"The argument `num_episodes` was received as {repr(num_episodes)}."
                " The `num_episodes` behavior can be turned on only when `num_envs` is not None."
                " However, `num_envs` was received as None."
            )
        if device is not None:
            raise ValueError(
                f"The argument `device` was received as {repr(device)}."
                " Having a target device is supported only when `num_envs` is not None."
                " However, `num_envs` was received as None."
            )
        to_be_wrapped = make_the_env()
    else:
        to_be_wrapped = SyncVectorEnv(
            [make_the_env for _ in range(num_envs)],
            empty_info=empty_info,
            num_episodes=num_episodes,
            device=device,
        )

    vec_env = TorchWrapper(
        to_be_wrapped,
        force_classic_api=force_classic_api,
        discrete_to_continuous_act=discrete_to_continuous_act,
        clip_actions=clip_actions,
    )

    return vec_env

make_vector_env(env_name, *, force_classic_api=False, num_envs=None, discrete_to_continuous_act=False, clip_actions=False, gym_kwargs=None, brax_kwargs=None, **kwargs)

Make a new vectorized environment and wrap it via TorchWrapper.

Parameters:

Name Type Description Default
env_name str

Name of the environment, as string. If the string starts with "gym::" (e.g. "gym::Humanoid-v4", etc.), then it is assumed that the target environment is a traditional non-vectorized gymnasium environment. This non-vectorized will first be duplicated and wrapped via a SyncVectorEnv so that it gains a vectorized interface, and then, it will be wrapped via TorchWrapper. If the string starts with "brax::" (e.g. "brax::humanoid", etc.), then it is assumed that the target environment is a brax environment which will be wrapped via TorchWrapper. If the string starts with "brax::old::" (e.g. "brax::old::humanoid", etc.), then the environment will be made using the namespace brax.v1 (which was introduced in brax version 0.9.0 where the updated implementations of the environments became default and the classical ones moved into brax.v1). You can use the prefix "brax::old::" for reproducing previous results that were obtained or reported using an older version of brax. If the string does not contain "::" at all (e.g. "Humanoid-v4"), then it is assumed that the target environment is a gymnasium environment. Therefore, "gym::Humanoid-v4" and "Humanoid-v4" are equivalent.

required
force_classic_api bool

Whether or not the classic gym API is to be used.

False
num_envs Optional[int]

Batch size for the vectorized environment.

None
discrete_to_continuous_act bool

Whether or not the the discrete action space of the environment is to be converted to a continuous one. This does nothing if the environment's action space is not discrete.

False
clip_actions bool

Whether or not the actions should be explicitly clipped so that they stay within the declared action boundaries.

False
gym_kwargs Optional[dict]

Keyword arguments to pass only if the environment is a classical gymnasium environment.

None
brax_kwargs Optional[dict]

Keyword arguments to pass only if the environment is a brax environment.

None
kwargs

Expected in the form of additional keyword arguments, these are passed to the environment.

{}
Source code in evotorch/neuroevolution/net/vecrl.py
def make_vector_env(
    env_name: str,
    *,
    force_classic_api: bool = False,
    num_envs: Optional[int] = None,
    discrete_to_continuous_act: bool = False,
    clip_actions: bool = False,
    gym_kwargs: Optional[dict] = None,
    brax_kwargs: Optional[dict] = None,
    **kwargs,
) -> TorchWrapper:
    """
    Make a new vectorized environment and wrap it via TorchWrapper.

    Args:
        env_name: Name of the environment, as string.
            If the string starts with "gym::" (e.g. "gym::Humanoid-v4", etc.),
            then it is assumed that the target environment is a traditional
            non-vectorized gymnasium environment. This non-vectorized
            will first be duplicated and wrapped via a `SyncVectorEnv` so that
            it gains a vectorized interface, and then, it will be wrapped via
            `TorchWrapper`.
            If the string starts with "brax::" (e.g. "brax::humanoid", etc.),
            then it is assumed that the target environment is a brax
            environment which will be wrapped via TorchWrapper.
            If the string starts with "brax::old::" (e.g.
            "brax::old::humanoid", etc.), then the environment will be made
            using the namespace `brax.v1` (which was introduced in brax
            version 0.9.0 where the updated implementations of the environments
            became default and the classical ones moved into `brax.v1`).
            You can use the prefix "brax::old::" for reproducing previous
            results that were obtained or reported using an older version of
            brax.
            If the string does not contain "::" at all (e.g. "Humanoid-v4"),
            then it is assumed that the target environment is a gymnasium
            environment. Therefore, "gym::Humanoid-v4" and "Humanoid-v4"
            are equivalent.
        force_classic_api: Whether or not the classic gym API is to be used.
        num_envs: Batch size for the vectorized environment.
        discrete_to_continuous_act: Whether or not the the discrete action
            space of the environment is to be converted to a continuous one.
            This does nothing if the environment's action space is not
            discrete.
        clip_actions: Whether or not the actions should be explicitly clipped
            so that they stay within the declared action boundaries.
        gym_kwargs: Keyword arguments to pass only if the environment is a
            classical gymnasium environment.
        brax_kwargs: Keyword arguments to pass only if the environment is a
            brax environment.
        kwargs: Expected in the form of additional keyword arguments, these
            are passed to the environment.
    Returns:
        The vectorized gymnasium environment, wrapped by TorchWrapper.
    """

    env_parts = str(env_name).split("::", maxsplit=1)

    if gym_kwargs is None:
        gym_kwargs = {}
    if brax_kwargs is None:
        brax_kwargs = {}

    kwargs_to_pass = {}
    kwargs_to_pass.update(kwargs)

    if len(env_parts) == 0:
        raise ValueError(f"Invalid value for `env_name`: {repr(env_name)}")
    elif len(env_parts) == 1:
        fn = make_gym_env
        kwargs_to_pass.update(gym_kwargs)
    elif len(env_parts) == 2:
        env_name = env_parts[1]
        if env_parts[0] == "gym":
            fn = make_gym_env
            kwargs_to_pass.update(gym_kwargs)
        elif env_parts[0] == "brax":
            fn = make_brax_env
            kwargs_to_pass.update(brax_kwargs)
        else:
            invalid_value = env_parts[0] + "::"
            raise ValueError(
                f"The argument `env_name` starts with {repr(invalid_value)}, implying that the environment is stored"
                f" in a registry named {repr(env_parts[0])}."
                f" However, the registry {repr(env_parts[0])} is not recognized."
                f" Supported environment registries are: 'gym', 'brax'."
            )
    else:
        assert False, "Unexpected value received from len(env_parts)"

    return fn(
        env_name,
        force_classic_api=force_classic_api,
        num_envs=num_envs,
        discrete_to_continuous_act=discrete_to_continuous_act,
        clip_actions=clip_actions,
        **kwargs_to_pass,
    )

reset_tensors(x, indices)

Reset the specified regions of the given tensor(s) as 0.

Note that the resetting is performed in-place, which means, the provided tensors are modified.

The regions are determined by the argument indices, which can be a sequence of booleans (in which case it is interpreted as a mask), or a sequence of integers (in which case it is interpreted as the list of indices).

For example, let us imagine that we have the following tensor:

import torch

x = torch.tensor(
    [
        [0, 1, 2, 3],
        [4, 5, 6, 7],
        [8, 9, 10, 11],
        [12, 13, 14, 15],
    ],
    dtype=torch.float32,
)

If we wish to reset the rows with indices 0 and 2, we could use:

reset_tensors(x, [0, 2])

The new value of x would then be:

torch.tensor(
    [
        [0, 0, 0, 0],
        [4, 5, 6, 7],
        [0, 0, 0, 0],
        [12, 13, 14, 15],
    ],
    dtype=torch.float32,
)

The first argument does not have to be a single tensor. Instead, it can be a container (i.e. a dictionary-like object or an iterable) that stores tensors. In this case, each tensor stored by the container will be subject to resetting. In more details, each tensor within the iterable(s) and each tensor within the value part of the dictionary-like object(s) will be reset.

As an example, let us assume that we have the following collection:

a = torch.tensor(
    [
        [0, 1],
        [2, 3],
        [4, 5],
    ],
    dtype=torch.float32,
)

b = torch.tensor(
    [
        [0, 10, 20],
        [30, 40, 50],
        [60, 70, 80],
    ],
    dtype=torch.float32,
)

c = torch.tensor(
    [
        [100],
        [200],
        [300],
    ],
    dtype=torch.float32,
)

d = torch.tensor([-1, -2, -3], dtype=torch.float32)

my_tensors = [a, {"1": b, "2": (c, d)}]

To clear the regions with indices, e.g, (1, 2), we could do:

reset_tensors(my_tensors, [1, 2])

and the result would be:

>>> print(a)
torch.tensor(
    [
        [0, 1],
        [0, 0],
        [0, 0],
    ],
    dtype=torch.float32,
)

>>> print(b)
torch.tensor(
    [
        [0, 10, 20],
        [0, 0, 0],
        [0, 0, 0],
    ],
    dtype=torch.float32,
)

>>> print(c)
c = torch.tensor(
    [
        [100],
        [0],
        [0],
    ],
    dtype=torch.float32,
)

>>> print(d)
torch.tensor([-1, 0, 0], dtype=torch.float32)

Parameters:

Name Type Description Default
x Any

A tensor or a collection of tensors, whose values are subject to resetting.

required
indices MaskOrIndices

A sequence of integers or booleans, specifying which regions of the tensor(s) will be reset.

required
Source code in evotorch/neuroevolution/net/vecrl.py
def reset_tensors(x: Any, indices: MaskOrIndices):
    """
    Reset the specified regions of the given tensor(s) as 0.

    Note that the resetting is performed in-place, which means, the provided tensors are modified.

    The regions are determined by the argument `indices`, which can be a sequence of booleans (in which case it is
    interpreted as a mask), or a sequence of integers (in which case it is interpreted as the list of indices).

    For example, let us imagine that we have the following tensor:

    ```python
    import torch

    x = torch.tensor(
        [
            [0, 1, 2, 3],
            [4, 5, 6, 7],
            [8, 9, 10, 11],
            [12, 13, 14, 15],
        ],
        dtype=torch.float32,
    )
    ```

    If we wish to reset the rows with indices 0 and 2, we could use:

    ```python
    reset_tensors(x, [0, 2])
    ```

    The new value of `x` would then be:

    ```
    torch.tensor(
        [
            [0, 0, 0, 0],
            [4, 5, 6, 7],
            [0, 0, 0, 0],
            [12, 13, 14, 15],
        ],
        dtype=torch.float32,
    )
    ```

    The first argument does not have to be a single tensor.
    Instead, it can be a container (i.e. a dictionary-like object or an iterable) that stores tensors.
    In this case, each tensor stored by the container will be subject to resetting.
    In more details, each tensor within the iterable(s) and each tensor within the value part of the dictionary-like
    object(s) will be reset.

    As an example, let us assume that we have the following collection:

    ```python
    a = torch.tensor(
        [
            [0, 1],
            [2, 3],
            [4, 5],
        ],
        dtype=torch.float32,
    )

    b = torch.tensor(
        [
            [0, 10, 20],
            [30, 40, 50],
            [60, 70, 80],
        ],
        dtype=torch.float32,
    )

    c = torch.tensor(
        [
            [100],
            [200],
            [300],
        ],
        dtype=torch.float32,
    )

    d = torch.tensor([-1, -2, -3], dtype=torch.float32)

    my_tensors = [a, {"1": b, "2": (c, d)}]
    ```

    To clear the regions with indices, e.g, (1, 2), we could do:

    ```python
    reset_tensors(my_tensors, [1, 2])
    ```

    and the result would be:

    ```
    >>> print(a)
    torch.tensor(
        [
            [0, 1],
            [0, 0],
            [0, 0],
        ],
        dtype=torch.float32,
    )

    >>> print(b)
    torch.tensor(
        [
            [0, 10, 20],
            [0, 0, 0],
            [0, 0, 0],
        ],
        dtype=torch.float32,
    )

    >>> print(c)
    c = torch.tensor(
        [
            [100],
            [0],
            [0],
        ],
        dtype=torch.float32,
    )

    >>> print(d)
    torch.tensor([-1, 0, 0], dtype=torch.float32)
    ```

    Args:
        x: A tensor or a collection of tensors, whose values are subject to resetting.
        indices: A sequence of integers or booleans, specifying which regions of the tensor(s) will be reset.
    """
    if isinstance(x, torch.Tensor):
        # If the first argument is a tensor, then we clear it according to the indices we received.
        x[indices] = 0
    elif isinstance(x, (str, bytes, bytearray)):
        # str, bytes, and bytearray are the types of `Iterable` that we do not wish to process.
        # Therefore, we explicitly add a condition for them here, and explicitly state that nothing should be done
        # when instances of them are encountered.
        pass
    elif isinstance(x, Mapping):
        # If the first argument is a Mapping (i.e. a dictionary-like object), then, for each value part of the
        # Mapping instance, we call this function itself.
        for key, value in x.items():
            reset_tensors(value, indices)
    elif isinstance(x, Iterable):
        # If the first argument is an Iterable (e.g. a list, a tuple, etc.), then, for each value contained by this
        # Iterable instance, we call this function itself.
        for value in x:
            reset_tensors(value, indices)