When building a CNN (convolutional neural network), there are some things you’ll need and some things you should consider . What you’ll need is access to GPU, and the next is you’ll need a lot of labeled images. And when I say a lot, it could be minimum of 1,000 per class. However, using transfer learning you may be able to get away with less. Tensorflow and Theono backed packages such as Keras, provides the ability to use pre-trained models learning as the inputs to your newly created model, and with out a doubt, this helps model performance metrics. Especially works if you training images are some what closely related to ImageNet dataset. The main aspect to consider is just building the CNN from a transfer model or giving a shot at building it from scratch.
Regarding transfer learning, the reality is however, that most real-world applications of CNN for image recognition are not going to be that similar to ImageNet base of images. Not all is lost as you can still use those pre-trained model to help you achieve higher model accuracy. But what’s the cost? I ran a test of some image recognition project. And here are the considerations with using transer learning:
- training time – this could substantially increase your processing time, depending on your model architecture
- size of model – instead of a model that is 50mb, now how about 300mb. For some people in academics this is no big deal. But I’m talking a web service or having this model work locally on a phone or simple CPU, smaller is better
- can only use RGB images when using ImageNet pre-trained model. Bummer, b/c many times grayscale is all that is needed to perform well, and RGB requires more processing power and size of final model
To understand the trade offs between a CNN backed by transfer learning versus building CNN from scratch, I tested it out on a small dataset I’m working on. Details on my dataset:
- 2 classes; class 0: 250 labeled images, class 1: 1,000 labeled images (noticed classes are unbalanced? It’s a real-world problem)
- images do not closely resemble ImageNet (again, this is more real-world)
I’m running two models, one will be CNN from scratch, and the other will be leveraging transfer learning in which I’ll freeze the top 7 layers.
Both will use image augmentation, edge detection, and cross-validation to help with getting the most out of limited images in my training set. Will be running up to 300 epochs, with patience of 10, and callbacks to minimize log loss. I sure I could spend more time on trying to make marginal improvements on both models, but in this case I wanted to time box this initial model building to help me decide which path I go.
Results of CNN from scratch (on the smaller, more difficult class: class 1)
Results of CNN with transfer learning (on the smaller, more difficult class: class 1)
No surprise, the F1 score is better on the model with transfer learning at 0.93 vs 0.91. But add the expense of a model that is 10x as large. You make the call on the path you choose.