Adding Machine Learning to iOS Apps


Use case: using machine learning and iPhone’s camera, identify certain types of objects in real-time.


  1. create a cNN (convolutional neural network) in python with a ML (machine learning) package called Keras with TensorFlow backend.
  2. convert the newly created cNN to a format that can be used by iPhones by leveraging the Core ML iOS package.
  3. run the converted Core ML model on the iPhone to make predictions on what the phone’s camera is viewing.
  4. save the distribution of predicted probabilities from the Core ML model and send to API created with Firebase which is a BaaS (backend as a service).
  5. return the custom output based on API back to iPhone

Technologies/languages used:  Keras, python, iOS swift, Core ML, Firebase SDK, javascript

How the models and algorithms worked together:

I ended up creating combination Core ML models, in which loosely based on OOD (object orientated design).  How this works, is 1st model identifies domain, if greater than 50% probability, it would then call the next model, and if that is greater than 50% probability, which then call the API to get a returned result.

High-level diagram of how iOS can use models to make useful prediction:

Here is the iOS swift function to call both models.

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

// initial model always called for domain detection
guard let model_one = try? VNCoreMLModel(for: imagenet_ut().model) else { return }
let request = VNCoreMLRequest(model: model_one) { (finishedRequest, error) in
guard let results = finishedRequest.results as? [VNClassificationObservation] else { return }
guard let Observation = results.first else { return }
DispatchQueue.main.async(execute: {
confidence_one = Int(Observation.confidence * 100)
guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
// executes request
try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])


// model for issue 1

let chosen_all_model = getAllModel(cise: self.Model.cise)

guard let model_two = try? VNCoreMLModel(for: chosen_all_model) else { return }

let request_two = VNCoreMLRequest(model: model_two) { (finishedRequest, error) in
guard let results = finishedRequest.results as? [VNClassificationObservation] else { return }
guard let Observation = results.first else { return }

DispatchQueue.main.async(execute: {
_confidence_two = Int(Observation.confidence * 100)


In the complete iOS project, I have 12 models available to run based on various context of what the user was doing. The API create on Firebase could handle various combinations of calls based on these contexts, and send back a useful prediction to the User.

This is a personal blog. Any views or opinions represented in this blog are personal and belong solely to the blog owner and do not represent those of people, institutions or organizations that the owner may or may not be associated with in professional or personal capacity, unless explicitly stated. Read entire disclaimer here.

CNN: Transfer Learning vs build from scratch


When building a CNN (convolutional neural network), there are some things you’ll need and some things you should consider .  What you’ll need is access to GPU, and the next is you’ll need a lot of labeled images.  And when I say a lot, it could be minimum of 1,000 per class.  However, using transfer learning you may be able to get away with less.  Tensorflow and Theono backed packages such as Keras, provides the ability to use pre-trained models learning as the inputs to your newly created model, and with out a doubt, this helps model performance metrics.  Especially works if you training images are some what closely related to ImageNet dataset.  The main aspect to consider is just building the CNN from a transfer model or giving a shot at building it from scratch.

Regarding transfer learning, the reality is however, that most real-world applications of CNN for image recognition are not going to be that similar to ImageNet base of images.  Not all is lost as you can still use those pre-trained model to help you achieve higher model accuracy.  But what’s the cost?   I ran a test of some image recognition project.  And here are the considerations with using transer learning:

  1.  training time – this could substantially increase your processing time, depending on your model architecture
  2. size of model – instead of a model that is 50mb, now how about 300mb.  For some people in academics this is no big deal.  But I’m talking a web service or having this model work locally on a phone or simple CPU, smaller is better
  3. can only use RGB images when using ImageNet pre-trained model.  Bummer, b/c many times grayscale is all that is needed to perform well, and RGB requires more processing power and size of final model

To understand the trade offs between a CNN backed by transfer learning versus building CNN from scratch, I tested it out on a small dataset I’m working on.  Details on my dataset:

  • 2 classes; class 0: 250 labeled images, class 1: 1,000 labeled images (noticed classes are unbalanced?  It’s a real-world problem)
  • images do not closely resemble ImageNet (again, this is more real-world)

I’m running two models, one will be CNN from scratch, and the other will be leveraging transfer learning in which I’ll freeze the top 7 layers.

Both will use image augmentation, edge detection, and cross-validation to help with getting the most out of limited images in my training set.  Will be running up to 300 epochs, with patience of 10,  and callbacks to minimize log loss.  I sure I could spend more time on trying to make marginal improvements on both models, but in this case I wanted to time box this initial model building to help me decide which path I go.

Results of CNN from scratch (on the smaller, more difficult class: class 1)

Results of CNN with transfer learning (on the smaller, more difficult class: class 1)

No surprise, the F1 score is better on the model with transfer learning at 0.93 vs 0.91.  But add the expense of a model that is 10x as large.  You make the call on the path you choose.