Recognizing Animals in Photos: Building an AI Model for Object Recognition

Estimated time:
time
min

<p style="text-align: left;"><i>Updated: September 26, 2020.</i></p> <blockquote><i>Our model for recognizing specific animals in images is a neural network consisting of multiple layers. The initial layers are already good at understanding the world in general, so we only need to train the final layers instead of "re-inventing the wheel".</i></blockquote> <h2>Object Detection - Transfer Learning</h2> Visual recognition with object detection transfer learning has been gaining popularity in biodiversity preservation and management. Since launching our AI for Good initiative, we have been working with biodiversity researchers and practitioners to deliver wildlife image recognition machine learning models and tools. Our first foray into this area was our project for Wild Detect, which aligned with one of our goals at Appsilon -- to use data science consulting to aid in the preservation and management of our planet’s wildlife and environment.  The goal was to build a model for visual recognition of specific kinds of animals. Since I cannot publish the original model, I will use a different dataset to show how I built a Proof of Concept deep learning model to demonstrate how I approached this problem. I first set out to choose the animal species that I would use for developing the model. Last year I gave a talk at <a href="https://noti.st/marekrogala/FYjAQP/using-deep-learning-on-satellite-imagery-to-get-a-business-edge" target="_blank" rel="noopener noreferrer">UseR in Brisbane</a>, which was quite a journey for me. I had a chance to visit the Koala Sanctuary there. Sadly, koalas are severely affected by climate change and recently they were declared <a href="https://www.lifegate.com/people/news/australia-koalas-functionally-extinct" target="_blank" rel="noopener noreferrer">functionally extinct</a>. To highlight this problem, I decided to choose Australian animals -- koalas and kangaroos -- for the purposes of this article. Visual recognition can be a powerful tool in many industries besides wildlife stewardship, including retail, defense, insurance (claims verification), and manufacturing (quality control).  Well-trained modern deep neural networks can give us very accurate results for a wide range of problems. Table of contents: <ul><li><a href="#dataset">The Dataset for Object Detection and Transfer Learning</a></li><li><a href="#model">The Convolutional Model We Used</a></li><li><a href="#training">Training an Object Detection Transfer Learning Model</a></li><li><a href="#interface">The Interface around the Machine Learning Model</a></li><li><a href="#summary">Summing up Object Detection and Transfer Learning</a></li></ul> <hr /> <h2 id="dataset"><span class="c5 c12">The Dataset for Object Detection and Transfer Learning</span></h2> Google Images is a good resource for building a such proof of concept models. The <a href="https://github.com/toffebjorkskog/ml-tools/blob/master/gi2ds.md" target="_blank" rel="noopener noreferrer">gi2ds</a> tool assists in the process of building a dataset in three simple steps: <ul><li style="font-weight: 400;">Run a Google Images search for each class to be included in the dataset</li><li style="font-weight: 400;">Run javascript code</li><li style="font-weight: 400;">Review and exclude unwanted images</li></ul> The resulting list of image URLs can be downloaded using code. <img class="size-full wp-image-15850" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d6da13d4a2a1c0fc023e_f9841003_1.webp" alt="Image 1 - Wildlife dataset for machine learning" width="400" height="400" /> Image 1 - Wildlife dataset for machine learning <p class="c0">In our case, we gathered images for three classes: “koala”, “kangaroo” and “other” (images of Australian wilderness without any koalas or kangaroos). 20% of the images were set aside as a validation set.</p> <h2 id="model"><span class="c5">The Convolutional Model We Used</span></h2> <p class="c0">The model is a convolutional neural network (CNN). CNN's excel at visual recognition. Specifically, we used a ResNet architecture, originally developed by a Cornell University/Microsoft team, which is a state-of-the-art architecture for visual tasks. Behind its exceptional accuracy is the idea to skip network layers during training, which helps eliminate much of the <a href="https://en.wikipedia.org/wiki/Vanishing_gradient_problem" target="_blank" rel="noopener noreferrer">vanishing gradients</a> problem, ultimately yielding a lower training error. It is fascinating that actually, the human brain does something similar.</p> <img class="size-full wp-image-15852" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d6db607aaefd97360064_7d03680b_2.gif" alt="Image 2 - ResNet architecture" width="239" height="400" /> Image 2 - ResNet architecture <p class="c0"><span class="c4">For the deep learning framework, we used PyTorch. We find it more convenient than Tensorflow for several reasons including the fact that PyTorch provides an <a href="https://pytorch.org/vision/stable/models.html" target="_blank" rel="noopener noreferrer">official set of pre-trained models</a> that can be used in various visual problems.   </span></p> <h2 id="training"><span class="c12 c5">Training an Object Detection Transfer Learning Model</span></h2> Instead of training the model from scratch, we used a version of <a href="https://arxiv.org/abs/1512.03385" target="_blank" rel="noopener noreferrer">ResNet</a> pre-trained on the ImageNet dataset. ImageNet is a dataset of over 15 million annotated images created for the <a href="http://image-net.org/challenges/LSVRC/" target="_blank" rel="noopener noreferrer">Large Scale Visual Recognition Challenge</a> (ILSVRC).   This technique of using a pre-trained model for a different task is called <b>transfer learning</b>. It allows for achieving exceptional results quickly.  Our model is a neural network consisting of multiple layers, and the initial layers of the pre-trained model are already quite effective at understanding the world in general.  We only needed to train the final layers. This step also goes a long way to minimizing the training time, and lets us achieve good results with only several hundred images for each of the three classes: “koala,” “kangaroo,” and “other.” <p class="c0"><b>We started by training the last 2 layers</b>, which gave a 1,98% error. To achieve an even lower score, we trained all the layers, which got the error rate down to 1,58%. Naturally, for a production model, we would do more fine-tuning and data augmentation. We would also need to gather a more realistic dataset. That being said, this model already proves what solution can be achieved.</p> <img class="size-full wp-image-15854" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0207c0413bbd3ff5bef34_3-5.webp" alt="Image 3 - Activation heatmap on a Kangaroo image" width="400" height="132" /> Image 3 - Activation heatmap on a Kangaroo image <h2 id="interface"><span class="c5">The Interface Around the Machine Learning Model</span></h2> Once the model has been taught to spot, in this case, kangaroos and koalas, the results can be made accessible in a variety of ways -- an API, a Shiny, or a Python web application.  We believe it is crucial to have a usable interface for a model so that the findings of our neural network can be made available and easily accessible to users who may also wish to see how the model arrived at a given conclusion. The user interface, which enables interaction between the human and the neural network is just as important as the actual artificial intelligence part. In the case of the Wild Detect project, we were contributing to building a standalone device that would eventually be installed in the wilderness, on a ranch, and/or a nature preserve and that can be regularly queried as new images come in from the inbuilt camera. <p class="c0">We are <a href="https://appsilon.com/shiny/">experts in building analytical web apps</a>, so for the POC, I built an app that allows for playing around with the model. Here is what it looks like:</p> <img class="size-full wp-image-15856" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d6dc7a511f178f8ded1d_f1b6153a_4.webp" alt="Image 4 - UI around our ML model" width="400" height="189" /> Image 4 - UI around our ML model <h2 id="summary"><span class="c12 c5">Summing up Object Detection and Transfer Learning</span></h2> AI can be very accurate in recognizing objects, animals, and people in images. Using transfer learning in object detection makes business applications even more feasible, and allows us to work with smaller datasets, which are often all we have. Accuracy and effort to build a model matter. It matters to a non-profit organization that is counting the few remaining koalas left on the planet, and it matters to a company counting inventory in its warehouse.  It matters to Wild Detect, for whom we made a successful Proof of Concept, and we are excited to journey with them further.   Organizations that manage large facilities, inventories, and land can all benefit from accurate visual recognition of objects.  See other Appsilon articles about object detection below. <h3><strong>Resources</strong></h3><ul><li style="font-weight: 400;">Need help with ML solutions? Reach out to the <a href="https://appsilon.com/computer-vision/" target="_blank" rel="noopener noreferrer">Appsilon ML Team</a></li><li style="font-weight: 400;"><a href="https://appsilon.com/ship-recognition-in-satellite-imagery-part-ii/">Ship recognition in satellite imagery</a></li><li style="font-weight: 400;"><a href="https://appsilon.com/ai-for-assisting-in-natural-disaster-relief-efforts-the-xview2-competition/">Natural disaster relief assistance</a></li><li style="font-weight: 400;"><a href="https://appsilon.com/using-ai-identify-wildlife-camera-trap-images-serengeti/">Identifying wildlife in the Serengeti</a></li><li style="font-weight: 400;">The <a href="https://appsilon.com/object-detection-yolo-algorithm/">YOLO Algorithm and YOLO Object Detection</a></li><li style="font-weight: 400;"><a href="https://appsilon.com/ai-for-wildlife-image-classification-appsilon-ai4g-project-receives-google-grant/">Assisting biodiversity conservation efforts in Gabon</a></li></ul>

Contact us!
Damian's Avatar
Damian Rodziewicz
Head of Sales
data for good
case studies
ai&research