Training MNIST30K

Training a CNN on MNIST using evotorch¶

This example demonstrates the application of the evotorch.neuroevolution.SupervisedNE Problem class to training a CNN on MNIST. This example follows set-up described in the recent DeepMind paper [1].

Note that to use this example, please ensure that torchvision is installed.

Setting up the `Problem` class¶

First we will define the model. For this example, we will use the 'MNIST-30k' model from the paper, which is defined below. Note that Table 1 has a typo; the second convolution should have a 5x5 kernel, rather than a 2x2 kernel. This gives the number of parameters the authors listed.

import torch
from torch import nn
from evotorch.neuroevolution.net import count_parameters

class MNIST30K(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        # The first convolution uses a 5x5 kernel and has 16 filters
        self.conv1 = nn.Conv2d(1, 16, kernel_size = 5, stride = 1, padding=2)
        # Then max pooling is applied with a kernel size of 2
        self.pool1 = nn.MaxPool2d(kernel_size = 2)
        # The second convolution uses a 5x5 kernel and has 32 filters
        self.conv2 = nn.Conv2d(16, 32, kernel_size = 5, stride = 1, padding = 2)
        # Another max pooling is applied with a kernel size of 2
        self.pool2 = nn.MaxPool2d(kernel_size = 2)

        # Apply layer normalization after the second pool
        self.norm = nn.LayerNorm(1568, elementwise_affine=False)

        # A final linear layer maps outputs to the 10 target classes
        self.out = nn.Linear(1568, 10)

        # All activations are ReLU
        self.act = nn.ReLU()

    def forward(self, data: torch.Tensor) -> torch.Tensor:
        # Apply the first conv + pool
        data = self.pool1(self.act(self.conv1(data)))
        # Apply the second conv + pool
        data = self.pool2(self.act(self.conv2(data)))

        # Apply layer norm
        data = self.norm(data.flatten(start_dim = 1))

        # Flatten and apply the output linear layer
        data = self.out(data)

        return data

network = MNIST30K()
print(f'Network has {count_parameters(network)} parameters')

Now lets pull the dataset (to use with standard transforms).

from torchvision import datasets, transforms

transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
    ])
train_dataset = datasets.MNIST('../data', train=True, download=True,
                   transform=transform)
test_dataset = datasets.MNIST('../data', train=False,
                   transform=transform)

Now we are ready to create a custom problem class. The below is configured to use 4 actors, and divide the available GPUs between them. You can scale this up to dozens or even hundreds of CPUs and GPUs on a ray cluster simply by modifying the num_actors parameter.

from evotorch.neuroevolution import SupervisedNE

mnist_problem = SupervisedNE(
    train_dataset,  # Using the dataset specified earlier
    MNIST30K,  # Training the MNIST30K module designed earlier
    nn.CrossEntropyLoss(),  # Minimizing CrossEntropyLoss
    minibatch_size = 1024,  # With a minibatch size of 1024
    common_minibatch = True,  # Always using the same minibatch across all solutions on an actor
    num_actors = 4,  # The total number of CPUs used
    num_gpus_per_actor = 'max',  # Dividing all available GPUs between the 4 actors
    subbatch_size = 50,  # Evaluating solutions in sub-batches of size 50 ensures we won't run out of GPU memory for individual workers
)

Training¶

Now we can set up the searcher.

In the paper, they used SNES with, effectively, default parameters, and standard deviation 1. The authors achieved 98%+ with only a population size of 1k, but this value can be pushed further. Note that by using the distributed = True keyword argument, we obtain semi-updates from the individual actors which are averaged.

In our example, we use PGPE with a population size of 3200. Hyperparameter configuration can be seen below:

from evotorch.algorithms import PGPE

searcher = PGPE(
    mnist_problem,
    radius_init=2.25, # Initial radius of the search distribution
    center_learning_rate=1e-2, # Learning rate used by adam optimizer
    stdev_learning_rate=0.1, # Learning rate for the standard deviation
    popsize=3200,
    distributed=True, # Gradients are computed locally at actors and averaged
    optimizer="adam", # Using the adam optimizer
    ranking_method=None, # No rank-based fitness shaping is used
)

Let's create some loggers. We'll run evolution for quite a long time, so it's worth reducing the log frequency.

from evotorch.logging import StdOutLogger, PandasLogger
stdout_logger = StdOutLogger(searcher, interval = 1)
pandas_logger = PandasLogger(searcher, interval = 1)

Running evolution for 400 generations (note that in the paper, it was 10k generations)...

searcher.run(400)

We can visualize the progress:

pandas_logger.to_dataframe().mean_eval.plot()

And of course, it is worth while to measure the test performance

#net = mnist_problem.parameterize_net(searcher.status['center']).cpu()
net = mnist_problem.make_net(searcher.status["center"]).cpu()

loss = torch.nn.CrossEntropyLoss()
net.eval()
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = 256, shuffle = False)
test_loss = 0
correct = 0

with torch.no_grad():
    for data, target in test_loader:
        output = net(data)
        test_loss += loss(output, target).item() * data.shape[0]
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).sum()
    test_loss /= len(test_loader.dataset)
print('Test set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))

mnist_problem.kill_actors()

References¶

[1] Lenc, Karel, et al. "Non-differentiable supervised learning with evolution strategies and hybrid methods." arXiv preprint arXiv:1906.03139 (2019).

See this notebook on GitHub

Training MNIST30K

Training a CNN on MNIST using evotorch¶

Setting up the Problem class¶

Training¶

References¶

Setting up the `Problem` class¶