A guide to GPU-accelerated ship recognition in satellite imagery using Keras and R (part I)
Search our blog
This article is divided into a series of posts. In this post, I will explain the basic concepts behind convolutional neural networks and how to build them using Keras. In the next post, I focus on improving the performance of the network.
Artificial Intelligence or AI has exploded in popularity both in business as in society. Companies large and small are redirecting their digital transformation to include technologies that are the true representation of what AI currently is; namely, deep learning. Deep learning is a subset of machine learning, which more generally, falls into data science. Both machine learning and deep learning find themselves at the peak of 2017’s Gartner Hype Cycle and are already making a huge impact on the current technological status quo. Let’s take a look at one way of going about creating a basic machine learning model.
What is TensorFlow and Keras ?
TensorFlow is an open-source software library for Machine Intelligence that allows you to deploy computations to multiple CPUs or GPUs. It was developed by researchers and engineers working on the Google Brain Team.
Keras is a high-level neural networks API capable of running on top of multiple back-ends including: TensorFlow, CNTK, or Theano. One of its biggest advantages is its “user friendliness”. With Keras you can easily build advanced models like convolutional or recurrent neural network.
To install TensorFlow and Keras from R use install_keras() function. If you want to use the GPU version you have to install some prerequisites first. This could be difficult but it is worth the extra effort when dealing with larger and more elaborate models. I strongly recommend you to do this! You can read more here.
For the task we will use a dataset of 2800 satellite pictures from Kaggle. Every row contains information about one photo (80-pixel height, 80-pixel width, 3 colors - RGB color space). To input data into a Keras model, we need to transform it into a 4-dimensional array (index of sample, height, width, colors). Every picture is associated with a label that could be equal 1 for a ship and 0 for non-ship object. Also here we have to use some transformations to create a binary matrix for Keras.
Now we can take a look at some sample of our data. Notice that if a ship appeared partially on a picture, then it wasn’t labeled as a 1.
The last thing we have to do is to split our data into training and test sets.
In Keras you can build models in 3 different ways using:
a sequential model
For now, we will only use sequential models. But before that, we have to understand the basic concepts behind convolutional neural networks.
Convolutional neural networks (CNN) or ConvNets are a class of deep, feed-forward artificial neural networks designed for solving problems like image/video/audio recognition, and object detection etc. The architecture of ConvNets differs depending on the issue, but there are some basic commonalities.
The first type of layer in CNN’s is a convolutional layer and it is a core building block of ConvNets. Simply put, we take a small set of filters (also called kernels) and place them on part of our original image to get the dot product between kernels and corresponding image parts. Next, we move our filter to the next position and repeat this action. The number of pixels that we move the filters is called a stride. After getting the dot product for the whole image, we get a so-called activation map.
The second type of layer in CNN’s is the pooling layer. This layer is responsible for dimensionality reduction of activation maps. There are several types of pooling, but max pooling is most commonly used. As it was in the case of convolutional layers, we have some filter and strides. After placing the filter on an image part, we take the maximum value from that part and move to the next region by the number of pixels, specified as strides.
The third type of layer in CNN’s is called the activation layer. In this layer, values from activation maps are transformed by some activation function. There are several functions to use but most common one is called a rectified linear unit (ReLU).
The fourth type of layer is called a densely (fully) connected layer which is a classical output layer known as a feed-forward neural networks. This fully connected layer is placed at the end of a ConvNet.
We begin by creating an empty sequential model
Now we can add some additional layers. Note that objects in Keras are modified in-place so there’s no need for consecutive assignment. In the first layer, we have to specify the shape of our data.
After building the architecture for our CNN, we have to configure it for training. We must specify the loss function, optimizer and additional metrics for evaluation. For example, we can use stochastic gradient descent as an optimization method and cross-entropy as a loss function.
Finally, we are ready to fit the model but there is one more thing we can do. If we want to have a good and quick visualization of our results, we can run a visualization tool called TensorBoard.
The last thing to do is to get evaluation metrics and predictions from the test set.
As we can see, the model leaves room for improvement. It has a low accuracy (.075) and a high cross entropy loss (0.52). It is, however, a good introduction and start to Keras. We are going to explore ways of improving the network and achieving better results in part two. See you soon!