Pytorch(1) | Freya Guo

In view of my current research about classification of gene mutation, I will share some knowledge about deep learning.

Use of pytorch

Automatic gradient
Create neural networks

Automatic gradient

The autograd package is the core of all neural networks in PyTorch. Learn some basics first, then start training the first neural network. The autograd package provides automatic derivation of all operations on the Tensors. It’s a define-by-run framework, which means that backpropagation is defined by the code’s behavior and each single iteration may be different.

1) Variable
autograd.Variable is the central class for this package. It packs a Tensor and supports almost all operations. Once you have completed your calculations, you can call .backward() and all gradients can be calculated automatically.

You can access the original tensor using the .data property. Gradient values relative to variables can be accumulated into .grad.

Variables and functions are interrelated and create an acyclic graph. Each variable has a .grad_fn attribute that references a function that created the variable (except for those created by the user - their grad_fn is empty).

If you want to calculate the derivative, you can call .backward() on Variable. If the variable is a scalar (only one element), you don’t need to determine any parameters for backward(). However, if it has multiple elements, you need to determine the grad_output parameter (this is a tensor with a matching shape).

1 2	import torch from torch.autograd import Variable

create a variable

1 2	x = Variable(torch.ones(2,2), requires_grad=True) print(x)

do an operation on the variable

1 2	y = x + 2 print(y)

y is created as a result of an operation, so it has grad_fn.

1	print(y.grad_fn)

do more operations in y

1
2
3

z = y*y*3
out = z.mean()
print(z, out)

2) Gradients
Let’s do the backpropagation now. Out.backward() is equivalent to out.backward(torch.Tensor([1.0])).

1	out.backward()

print gradient d(out)/dx

1	print(x.grad)

autograd can do more

x = torch.randn(3)
x = Variable(x, requires_grad=True)
y = x * 2
while y.grad.norm() < 1000:
   y = y * 2
print(y)
gradients = torch.FloatTensor([0.1,1.0,0.0001])
y.backwward(gradients)
print(x.grad)

create neural networks

Neural networks can be built using torch.nn. nn relies on autograd to define the model and derive it. An nn.Module contains layers of the network, and forward(input) can return output.

The typical steps for training a neural network are as follows:
a) Define a neural network that contains parameters that can be learned (such as weights)
b) Iterating over the input dataset
c) Use the network to process input data
d) Calculate loss (how far the output value is from the correct value)
e) Backpropagate the gradient back to the network parameters
f) Update the weight of the network, using a simple update rule: weight = weight - learning_rate gradient, ie: new weight = old weight - learning rate gradient value.

1) Define a neural network

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
        
    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features
net = Net()
print(net)

You only need to define the forward function, then the backward function (the gradient is calculated in this function) will be automatically defined using autograd. You can use any of Tensor’s operations in the forward function.

The learned parameters can be returned by net.parameters().

1
2
3

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

2) Loss Function
The loss function uses the output value and the target value as input parameters to calculate how far the output value is from the target value. There are many different loss functions in the nn package. The simplest loss is nn.MSELoss, which calculates the mean square error between the output and the target.

For example:

output = net(input)
target = Variable(torch.arange(1, 11))  # a dummy target, for example
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

When we call loss.backward(), the entire graph is about loss, and all the Variables in the graph will have their own .grad variable.

In order to understand, we perform several reverse steps.

1
2
3

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

3) Backpropagation
Error backpropagation can be done using loss.backward(). You need to clear the existing gradient values, otherwise the gradient will accumulate on the existing gradient.

Now, let’s call loss.backward() to see the value of the bias gradient of conv1 before and after the backward.

net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

4) Update weight
The simplest update rule in practice is StochasticGradient Descent (SGD).

weight = weight - learning_rate * gradient

1
2
3

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

But when you use neural networks, you may want to try a variety of different update rules, such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To implement this functionality, there is a package called torch.optim that has been implemented. It is also very convenient to use it:

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

Original article: https://blog.csdn.net/zzlyw/article/details/78768996