Skip to content

rl

This namespace various RL-specific utilities.

ActClipLayer (Module)

Source code in evotorch/neuroevolution/net/rl.py
class ActClipLayer(nn.Module):
    def __init__(self, box: gym.spaces.Box):
        nn.Module.__init__(self)

        self.lb = torch.as_tensor(box.low, dtype=torch.float32)
        self.ub = torch.as_tensor(box.high, dtype=torch.float32)

    def forward(self, x):
        return torch.min(torch.max(x, self.lb), self.ub)

forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/rl.py
def forward(self, x):
    return torch.min(torch.max(x, self.lb), self.ub)

ObsNormLayer (Module)

Observation normalization layer for a policy network

Source code in evotorch/neuroevolution/net/rl.py
class ObsNormLayer(nn.Module):
    """Observation normalization layer for a policy network"""

    def __init__(self, stats: RunningStat, trainable_stats: bool):
        """`__init__(...)`: Initialize the observation normalization layer

        Args:
            stats: The RunninStat object storing the mean and stdev of
                all of the observations.
            trainable_stats: Whether or not the normalization data
                are to be stored as trainable parameters.
        """
        nn.Module.__init__(self)

        mean = torch.tensor(stats.mean, dtype=torch.float32)
        stdev = torch.tensor(stats.stdev, dtype=torch.float32)

        if trainable_stats:
            self.obs_mean = nn.Parameter(mean)
            self.obs_stdev = nn.Parameter(stdev)
        else:
            self.obs_mean = mean
            self.obs_stdev = stdev

    def forward(self, x):
        x = x - self.obs_mean
        x = x / self.obs_stdev
        return x

__init__(self, stats, trainable_stats) special

__init__(...): Initialize the observation normalization layer

Parameters:

Name Type Description Default
stats RunningStat

The RunninStat object storing the mean and stdev of all of the observations.

required
trainable_stats bool

Whether or not the normalization data are to be stored as trainable parameters.

required
Source code in evotorch/neuroevolution/net/rl.py
def __init__(self, stats: RunningStat, trainable_stats: bool):
    """`__init__(...)`: Initialize the observation normalization layer

    Args:
        stats: The RunninStat object storing the mean and stdev of
            all of the observations.
        trainable_stats: Whether or not the normalization data
            are to be stored as trainable parameters.
    """
    nn.Module.__init__(self)

    mean = torch.tensor(stats.mean, dtype=torch.float32)
    stdev = torch.tensor(stats.stdev, dtype=torch.float32)

    if trainable_stats:
        self.obs_mean = nn.Parameter(mean)
        self.obs_stdev = nn.Parameter(stdev)
    else:
        self.obs_mean = mean
        self.obs_stdev = stdev

forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in evotorch/neuroevolution/net/rl.py
def forward(self, x):
    x = x - self.obs_mean
    x = x / self.obs_stdev
    return x

reset_env(env)

Reset a gym environment.

For gym 1.0, the plan is to have a reset(...) method which returns a two-element tuple (observation, info) where info is an object providing any additional information regarding the initial state of the agent. However, the old (pre 1.0) gym API (and some environments which were written with old gym compatibility in mind) has (or have) a reset(...) method which returns a single object that is the initial observation. With the assumption that the observation space of the environment is NOT tuple, this function can work with both pre-1.0 and (hopefully) after-1.0 versions of gym, and always returns the initial observation.

Please do not use this function on environments whose observation spaces or tuples, because then this function cannot distinguish between environments whose reset(...) methods return a tuple and environments whose reset(...) methods return a single observation object but that observation object is a tuple.

Parameters:

Name Type Description Default
env Env

The gym environment which will be reset.

required

Returns:

Type Description
Iterable

The initial observation

Source code in evotorch/neuroevolution/net/rl.py
def reset_env(env: gym.Env) -> Iterable:
    """
    Reset a gym environment.

    For gym 1.0, the plan is to have a `reset(...)` method which returns
    a two-element tuple `(observation, info)` where `info` is an object
    providing any additional information regarding the initial state of
    the agent. However, the old (pre 1.0) gym API (and some environments
    which were written with old gym compatibility in mind) has (or have)
    a `reset(...)` method which returns a single object that is the
    initial observation.
    With the assumption that the observation space of the environment
    is NOT tuple, this function can work with both pre-1.0 and (hopefully)
    after-1.0 versions of gym, and always returns the initial observation.

    Please do not use this function on environments whose observation
    spaces or tuples, because then this function cannot distinguish between
    environments whose `reset(...)` methods return a tuple and environments
    whose `reset(...)` methods return a single observation object but that
    observation object is a tuple.

    Args:
        env: The gym environment which will be reset.
    Returns:
        The initial observation
    """
    result = env.reset()
    if isinstance(result, tuple) and (len(result) == 2):
        result = result[0]
    return result

take_step_in_env(env, action)

Take a step in the gym environment. Taking a step means performing the action provided via the arguments.

For gym 1.0, the plan is to have a step(...) method which returns a 5-elements tuple containing observation, reward, terminated, truncated, info where terminated is a boolean indicating whether or not the episode is terminated because of the actions taken within the environment, and truncated is a boolean indicating whether or not the episode is finished because the time limit is reached. However, the old (pre 1.0) gym API (and some environments which were written with old gym compatibility in mind) has (or have) a step(...) method which returns 4 elements: observation, reward, done, info where done is a boolean indicating whether or not the episode is "done", either because of termination or because of truncation. This function can work with both pre-1.0 and (hopefully) after-1.0 versions of gym, and always returns the 4-element tuple as its result.

Parameters:

Name Type Description Default
env Env

The gym environment in which the given action will be performed.

required

Returns:

Type Description
tuple

A tuple in the form (observation, reward, done, info) where observation is the observation received after performing the action, reward is the amount of reward gained, done is a boolean value indicating whether or not the episode has ended, and info is additional information (usually as a dictionary).

Source code in evotorch/neuroevolution/net/rl.py
def take_step_in_env(env: gym.Env, action: Iterable) -> tuple:
    """
    Take a step in the gym environment.
    Taking a step means performing the action provided via the arguments.

    For gym 1.0, the plan is to have a `step(...)` method which returns a
    5-elements tuple containing `observation`, `reward`, `terminated`,
    `truncated`, `info` where `terminated` is a boolean indicating whether
    or not the episode is terminated because of the actions taken within the
    environment, and `truncated` is a boolean indicating whether or not the
    episode is finished because the time limit is reached.
    However, the old (pre 1.0) gym API (and some environments which were
    written with old gym compatibility in mind) has (or have) a `step(...)`
    method which returns 4 elements: `observation`, `reward`, `done`, `info`
    where `done` is a boolean indicating whether or not the episode is
    "done", either because of termination or because of truncation.
    This function can work with both pre-1.0 and (hopefully) after-1.0
    versions of gym, and always returns the 4-element tuple as its result.

    Args:
        env: The gym environment in which the given action will be performed.
    Returns:
        A tuple in the form `(observation, reward, done, info)` where
        `observation` is the observation received after performing the action,
        `reward` is the amount of reward gained,
        `done` is a boolean value indicating whether or not the episode has
        ended, and
        `info` is additional information (usually as a dictionary).
    """
    result = env.step(action)
    if isinstance(result, tuple):
        n = len(result)
        if n == 4:
            observation, reward, done, info = result
        elif n == 5:
            observation, reward, terminated, truncated, info = result
            done = terminated or truncated
        else:
            raise ValueError(
                f"The result of the `step(...)` method of the gym environment"
                f" was expected as a tuple of length 4 or 5."
                f" However, the received result is {repr(result)}, which is"
                f" of length {len(result)}."
            )
    else:
        raise TypeError(
            f"The result of the `step(...)` method of the gym environment"
            f" was expected as a tuple of length 4 or 5."
            f" However, the received result is {repr(result)}, which is"
            f" of type {type(result)}."
        )
    return observation, reward, done, info