We’ve tried to cover them in this article as training a CNN model for classifying car models dataset. You can download the dataset here. Any dataset can be utilized with slight changes in code. Albeit following structure of sub-folder should be applied:

r7bthvstxw-1
    ├── hatchback
    ├── motorcycle
    ├── pickup
    ├── sedan
    └── suv
r7bthvstxw-1 is top folder of the dataset including hatchback, motorcycle, pickup, sedan and suv categories of cars. It should be noted that each sub-folder has each own images. Now lets dig into the code. I’ve started with import section:

01 from tensorflow.python.keras.models import Sequential
02
from tensorflow.python.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, ReLU, Softmax
03
import cv2.load_config_py3
04
import glob
05
import os
06
import numpy as np
07
from sklearn.preprocessing import LabelEncoder
08
import tensorflow as tf
09
from sklearn.model_selection import train_test_split
10
from tensorflow.python.keras.optimizer_v2.adam import Adam
11
from tensorflow.python.keras.callbacks import ModelCheckpoint
12
from tensorflow.python.keras import backend as K

some of them may be erroneous if you’ve installed different version of libraries from mine. By a few minutes of googling you can overcome it. In following section, I’ve introduced several variables as settings:

14 HEIGHT = 32
15 WIDTH = 32
16 DEPTH = 3
17 VLD_SZ = 0.3
18 DS_DIR = '/home/babak/Downloads/dataset/r7bthvstxw-1'
19 INIT_LR = 0.01
20 EPOCHS = 100
21 BATCH_SZ = 32

HEIGHT and WEIGHT are input dimension of your model, respectively. DEPTH is number of channels for the model. It could be set as 1 either 3. 1 for gray scale (black and white) images and 3 for color images. VLD_SZ stands for validation size. Number of validation samples will be determined, accordingly. The remaining parameters are associated to training procedure. DS_DIR refers to path of the dataset as you see. INIT_LR, EPOCHS, and BATCH_SZ indicate initial learning rate, number of epochs, size of batch, respectively.

In next, we’ve defined a function for creating the model:

23 def make_mdl(num_class, input_w, input_h, input_d):
24 inp_shape = (input_h
, input_w, input_d)
25
26
if K.image_data_format() == 'channels_first':
27 inp_shape = (input_d
, input_h, input_w)
28
29 model = Sequential()
30 model.add(Conv2D(
16, kernel_size=(3, 3), input_shape=inp_shape))
31 model.add(ReLU())
32 model.add(MaxPooling2D(
pool_size=(2, 2), strides=(2, 2)))
33 model.add(Conv2D(
32, kernel_size=(3, 3)))
34 model.add(ReLU())
35 model.add(MaxPooling2D(
pool_size=(2, 2), strides=(2, 2)))
36 model.add(Flatten())
37 model.add(Dense(num_class))
38 model.add(Softmax())
39
return model

in line 23, make_mdl is defined to create a CNN network with 4 parameters, including num_class, input_w, input_h, and input_d where they stand for number of class, input width, input height, and input dimension. You may be wonder that why number of class is passed as input parameter. This is because, this strategy let our code be more flexible. By way of explanation, you can use the code with different number of folders is your dataset. e.g., I used the code for solving cat-dog problem.

Afterward, we’ve tried to read images from the dataset folder:

41 # loading images
42 im_paths = glob.glob(os.path.join(DS_DIR, '*/*.jpg'))
43
44 x = []
45 y = []
46
47 # reading x and y
48 for p in im_paths:
49 y.append(p.split(sep=os.path.sep)[-2])
50 if DEPTH == 1:
51 im = cv2.imread(p, cv2.IMREAD_GRAYSCALE)
52 else:
53 im = cv2.imread(p, cv2.IMREAD_COLOR)
54 im = cv2.resize(src=im, dsize=(WIDTH, HEIGHT), interpolation=cv2.INTER_CUBIC)
55 x.append(im)
56
57 # normalizing x and y
58 x = np.array(x, dtype='float')/255.0
59 enc = LabelEncoder()
60 y = enc.fit_transform(y)
61 y = tf.keras.utils.to_categorical(y)

In line 42, jpg images are searched in each folder of dataset. Method glob from class glob will be searching for files with a pattern. Method join used to attach */*.jpg to the folder of the dataset. As a result, the path well be something similar to this:

'/home/babak/Downloads/dataset/r7bthvstxw-1/*/*.jpg'

* here means in all of something. Putting all together, it means search in the dataset folder at any sub-folder and for any jpg image. Moving forward, x and y are inputs and corresponding labels defined as list in line 44 and 55. A loop for each item of im_path is defined in line 48. Then, in line 49, each path p, is separated by character separator which can be find as os.path.sep. After the separation, the penultimate item has been determined as label, as you see as [-2]. In line 50 to 55, we try read image using method imread. If DEPTH be 1, so image must be gray scale, consequently it has been read with IMREAD_GRAYSCALE flag. Otherwise, image must be read as color using IMREAD_COLOR. After that, the image must be resized according to WIDTH and HEIGHT as first step of normalization which can be seen in line 54. Finally, the image im is appended to inputs x. Normalization is not finished here, both input x and y must be normalized more. x is converted to np.array and float type and also divided by 255.0 in line 58. In line 59, an instance of LabelEncoder has been initiated as enc. enc is in charge of converting string to series of number such as 1, 2, 3, 4, 5. This, however is not enough to be compatible for network training procedure. The most well-known strategy for output of neural network based classifiers is one-hot coding. In line 61, method to_categorical has done it. Following table illustrates how it convert inputs to outputs:

number of class

value for each output neuron

neuron 0

neuron 1

neuron 2

neuron 3

neuron 4

0

1

0

0

0

0

1

0

1

0

0

0

2

0

0

1

0

0

3

0

0

0

1

0

4

0

0

0

0

1

As it can be recognized, for each of which values of input, all neuron will be set to zero except the neuron with same index of class number.

After providing samples, inputs and outputs, both must be divided into two or three groups. This is because, there must be some samples for evaluation, validation, and training. If you are not worry about over-fitting, you can use two groups. Over-fitting is a situation in which your model summarize the input and corresponding out, not finding meaningful relationship between input and output. To keep simplicity, I’ve divided the dataset into two group, one group for the training and remaining for the validation. You can segregate them, individually.

63 #spliting data
64 (x_trn, x_vld, y_trn, y_vld) = train_test_split(x, y, test_size=VLD_SZ, random_state=42)
65

in line 64, x and y are split into 4 groups (x_trn, x_vld, y_trn, y_vld) according validation size VLD_SZ and random_state = 42. You can choose any number you want. I should note that 42 comes from a myth (read the number 42 here).

In following section, we will move on making our model and prepare it for learning.

66 #determining num_class by number of folders in dataset
67 num_class = -1 # because os.walk return the input folder as result
68 for p in os.walk(DS_DIR):
69 num_class+=1
70
71 mdl = make_mdl(num_class=num_class, input_w=WIDTH, input_h=HEIGHT, input_d=DEPTH)

Before making the model we should determine number of classes according to our database. It is assumed that number of sub-folders in dataset folder is equal to number of classes. In Line 68 and 69, we’ve looped through dataset directory (folder) DS_DIR and add num_class for each of which subdirectories. One important thing about method walk is that it delivers its input folders in addition to subdirectories of input folder. Consequently, the loop in line 69 iterates one more time than the number of sub-folders. Therefore num_class has been set to -1, initially. In other words, instead of subtracting 1 from num_class after the loop, we’ve set it to -1 at the beginning. Now, we can make the model as it can be seen in line 71.

In next step, prerequisites of learning have been provided:

72 opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
73 mdl.compile(optimizer=opt, loss=tf.keras.losses.categorical_crossentropy, metrics=['accuracy'])
74
75 fn = os.path.sep.join(['check_points',
76 "weights-{epoch:03d}-{val_loss:.4f}.hdf5"])
77
78 chk_pnt = ModelCheckpoint(
79 fn,
80 monitor="val_loss",
81 mode="min",
82 save_best_only=True,
83 verbose=1)

In line 72, we’ve defined optimizer opt as Adam (stands for adaptive momentum) with lr=INIT_LR and decay=INIT_LR/EPOCHS. lr stands for initial learning rate. Subsequently, the model has been compiled in line 73. In other words, we’ve specified loss and monitoring metric(s). Through learning procedure, the optimizer algorithm, here is Adam, tries to reduce value of loss function which is categorical_crossentropy (print in screen) the metrics.

Without ability to use a model for inference, it is obviously absurd to train that model. In order to save model, it is possible to save it at the end of training procedure. This, however, can not be done due to long time procedure of some models, especially when it comes to complicated ones. Further more, power cut down must be considered while trying time consuming trying missions. Check point concept which is utilized and defined in line 78, might be remedy for the aforementioned issues. First argument is fn which is defined in line 75. In this line, join method concatenates term “check_point” (which is directory name beside main path of program) to parameterized and special path. The term epoch:03 means that use 3 digits for current number of epoch padded with zeros. Similarly, val_loss:.4f mean replace the term with current value of validation loss. It should be considered that these terms have to be surrounded by {} as it can be seen in line 75. Remaining part of the string which is defined in line 78, will be seen as part of name of saved model.

The second parameter is a criterion that you decide in which circumstances a model could be saved or not. Specifically, the second parameters which here is val_loss, determines which parameter of the training is deterministic for saving current weights of a model. The third parameter is mode of saving. Since, validation loss is the aim for saving model, min of them must be saved. To put it another way, model will be saved if current loss value be less than previous min value. It should be noted that if val_acc were selected which stands for validation accuracy, max must be applied as mode.

85 mdl.fit(
86 x_trn, y_trn,
87 batch_size = BATCH_SZ,
88 epochs=EPOCHS,
89 validation_data=(x_vld, y_vld),
90 steps_per_epoch=len(x_trn)//BATCH_SZ,
91 callbacks=[chk_pnt])

Finally, the model must be trained. This can be accomplished by calling fit method of a model. First and second parameters are training input and corresponding outs, respectively. The next parameters is number of epochs which here is equal to EPOCH. Afterward, tuple of validations parameters should be provided, here they are x_vld and y_vld. Following parameter is number of steps per epoch which can be calculated by dividing number of training samples (in training set) by batch size. The value is calculated as len(x_trn)//BATCH_SZ. The operator // is applied to leave out floating point part of the outcome. This operator only return integer part of division without rounding it. To be more precise, it performs similar to ceil method. Ultimately, there is an opportunity to call arbitrary sets of method on each epoch. The moment is seized by calling chk_pnt that is a check point. It is used in order to save model after each epoch if the performance of it (accuracy or loss value) be better than all of previous weights model.

After reading this article, you should be able to train your CNN based image-classifier. Further more, you learned how to avoid over-fitting. Additionally, you figure outed how to save model on each epoch if it were better than before. In other words, now you should be familiar with concept of check points. All of the codes are available from here.