First of all we import necessary libraries:
import pandas as pd
from torch.utils.data import Dataset
from sklearn.preprocessing import LabelEncoder
from torch.nn import Module
from torch import nn
from torch.optim import SGD
from torch.utils.data import random_split
from torch.utils.data import DataLoader
from sklearn.metrics import accuracy_score
import numpy as np
Install the packages if you don’t have them. For using pytorch to train a network, you must write follow sections of codes:
- providing dataset
- making model
- training
- evaluation
Providing dataset:
torch provides different facilities to ease working with datasets. Class Dataset is one of them. To use the class, we should inherit a class from it, Csv_Loader for instance. The inherited class must override at least three methods, __init__, __len__, and __get__:
012 # providing dataset
013 class Csv_Loader(Dataset):
014 def __init__(self, data_frame):
015 self.x = data_frame.values[:, :-1]
016 self.y = data_frame.values[:, -1]
017 self.x = self.x.astype('float32')
018 self.y = LabelEncoder().fit_transform(self.y)
019 self.y = self.y.astype('float32')
020 self.y = self.y.reshape((len(self.y), 1))
021
022 def __len__(self):
023 return len(self.x)
024
025 def __getitem__(self, idx):
026 return [self.x[idx], self.y[idx]]
027
028 def get_splits(self, train_sz):
029 train_sz = round(train_sz * len(self.x))
030 test_sz = len(self.x) - train_sz
031
032 trn, tst = random_split(self, [train_sz, test_sz])
033
034 trn = DataLoader(trn, batch_size=32, shuffle=True)
035 tst = DataLoader(tst, batch_size=32, shuffle=False)
036
037 return trn, tst
In the __init__ method, you should load samples, read a csv or data_frame for example. In line 15, we select all columns of data except the last one and in line 16 we select last column and put them into x and y, respectively. In line 17, we convert our data type to float32. This is because input type of the neural net must be in float32. In line 18, we convert labels in the last column of the dataset to neural network acceptable type. Our data in last column has two different values, “m” and “r”. “m” stands for mine and “r” stands for rock. I notice that input and outputs of neural networks must be normalized. Here, I mean that “r” or “m” must be converted to numerical values. We assume that output is equal to 1 if “m” be appeared in output, otherwise (“rock”) 0. In line 18, fit_transform method from an instance of LabelEncoder class has been called immediately. The function converts “m” and “r” to “0” and “1”. In line 19, y values has also been converted to float32 type. In line 20, array of single values has been converted to array of arrays. This is what happened before and after line 20:
before: y = [0., 0., 0., 1., 1., 1.]
after : y = [[0.], [0.], [0.], [1.], [1.], [1.]]
please don’t blame me for the conversion. The strategy of array of arrays could be more meaningful in case of using multi output scenarios.
Making model:
For making model, you can use another facility of torch library. In another word, you should inherit class Module and implement __init__ and forward methods of it.
039 # making model
040 class Slp(Module):
041 def __init__(self, n_input):
042 super(Slp, self).__init__()
043 self.h1 = nn.Linear(n_input, 1)
044 self.a1 = nn.Sigmoid()
045
046
047 # propagate forward
048 def forward(self, x):
049 x = self.h1(x)
050 x = self.a1(x)
051 return x
In line 40, Slp class is inherited from class Module. In line 41, method __init__ is implemented and gets number of input neurons as n_input. In line 42, __init__ method of super class of Slp has been called. For proper use of the inherited class, you should do that. In line 43, we define h1 (hidden layer 1) as instance of Linear class. nn is a subclass of torch and Linear is same as a fully connected layer. First parameter of constructor of Linear class is number of input neurons and the second one is the number of output. We want to keep solution as simple as possible, therefor a single layer of perceptron (Slp) has been used as a model. Number of inputs is equal to number of features, here it is number of input columns of sonar dataset. Number of the output is equal to 1 because we can map ‘m’ and ‘r’ with ‘1’ and ‘0’, respectively. The second layer of the model, activation layer 1 (a1) is defined in line 44 which is Sigmoid.
The method forward describes how layers are connected and how data moves from the input to the out. We defined h1 and a1 as hidden 1 and activation 1, previously. In line 49 we applied h1 to input x, and then applied a1 to it in line 50 and finally return the output in line 51.
training:
In this section, we should define an optimizer and a loss function. Afterward, we must loop into iteration operation. The passage “iteration operations” is not something common, I called it because we must do the operations in each iteration. Following operations or steps must be done:
-
Set gradient values to zero by calling zero_grad method of optimizer.
-
Calculate loss and propagate it using backward method.
-
Update weights using step method of optimizer.
054 # training
055 def train(model, train_data, n_epoch):
056 opt = SGD(md.parameters(), lr=0.01, momentum=0.9)
057 loss = nn.BCELoss()
058
059 for i in range(0, n_epoch):
060 for j, (inp, out) in enumerate(train_data):
061 opt.zero_grad()
062 pred = model(inp)
063 ls = loss(pred, out)
064 ls.backward()
065 opt.step()
in line 56, we defined optimizer as SGD (stochastic gradient descent) with lr (learning rate) of 0.01 and momentum of 0.9. The values of SGD optimizer are not very important, we choose them by rule of thumb. You can select any value between 0.1 to 0.0001 for lr and 0.8 to 0.95 for momentum. We define loss as BCELoss. BCELoss is an abbreviation for binary cross entropy loss.
In line 59, a loop has been defined to repeat training procedure for n_epoch times. We created another loop for to get input and out data from training_data in line 60. As mentioned before, we set gradient to zero in line 61 using zero_grad method. In line 62, we calculate output of the network for input values. Here, we have been called the forward method implicitly. loss value is calculated as ls in 63 and its backward methods has been called in line 64, and finally, step method of optimizer has been called in line 65.
evaluation:
Without any evaluation, you have no progress. Suppose we have two classes and our accuracy is around 50%. Our algorithm cost flipping a coin! In other words, if we achieve 50% of accuracy for two class, it is better to replace our algorithm with a coin. To evaluate a model, we need two things, a model and test dataset.
067 # evaluation
068 def eval(model, test_ds):
069 predicteds = []
070 actuals = []
071
072 for i, (inputs, targets) in enumerate(test_ds):
073 actual = targets.numpy()
074 actual = actual.reshape((len(actual), 1))
075
076 pred = model(inputs)\
077 .detach()\
078 .numpy()\
079 .round()
080
081 predicteds.append(pred)
082 actuals.append(actual)
083
084 predictions = np.vstack(predicteds)
085 actuals = np.vstack(actuals)
086
087 acc = accuracy_score(actuals, predictions)
088 return acc
In line 68 we defined a function with two parameters, model and test_ds. In lines 69 and 70, predicted values by model and actual values are defined as predicteds and actuals, respectively. In line 73, we have got numpy values of the actual. In line 74, we converted actual values as mentioned before in line 20. In line 76, we predict value for inputs, then detach them, afterward get numpy values of them, and finally put them to pred. Predicted values and actual values are appended to predicteds and actuals lists. In lines 84 and 85, we convert the lists to vertical stack of numpy values using vstack function. In line 87, accuracy_score calculates accuracy and return it and line 88.
Now we prepared prerequisites and we must put them together in main program.
090 #main
091 path = 'sonar.all-data'
092 train_sz = 0.8
093
094 # providing dataset
095 df = pd.read_csv(path, header=None)
096 n_input = len(df.columns) - 1
097 ds = Csv_Loader(df)
098 trn, tst = ds.get_splits(test_sz)
099
100 # making model
101 md = Slp(n_input)
102 print('model: ')
103 print(md)
104 print("#"*50)
105
106 # training
107 print('training...')
108 train(model=md, train_data=trn, n_epoch=100)
109 print("#"*50)
110
111 # evaluation
112 acc = eval(model=md, test_ds=tst)
113 print('acc = ' + str(acc))
path is a file path to csv file. test_sz is ratio of data will be used for training. In line 95, whole dataset has been read. We need number of input neurons, we can use an immediate or constant value for that. To keep scalability, we calculate number of columns of csv dataset and subtract 1 from it and save it n_input, in line 96. In lines 97 and 98, we have read dataset from a data frame (provided by pandas library in line 95), and got split them by get_splits method.
In 101, we’ve created our network. In line 108, we’ve trained network. Finally, calculate and show the network accuracy in line 112 and 113.
You can download all code from my repo.
To keep every thing simple, we don’t point to over fitting problem, save model on a file, showing training procedure, data augmentation, model optimizations, and checkpoints.