← All courses

Computer Vision on Mobile

🗓 May 31, 2026 ⏱ 2 min read

The camera is the AI superpower of phones

A huge share of Mobile AI features come from the camera: identifying objects, scanning documents, applying effects, measuring, translating signs. This lesson covers the common vision tasks and how to run them smoothly in real time.

The main vision tasks

  • Image classification — “what is this a picture of?” (one label for the whole image).
  • Object detection — “what objects are here and where?” (boxes + labels).
  • Segmentation — which pixels belong to which object (e.g. background removal).
  • Face/pose landmarks — key points for filters, fitness, AR.
  • OCR — reading text from the scene.

Still image vs real-time

Running a model on a single captured photo is easy. The challenge is real-time processing of the live camera feed — many frames per second — without lag or overheating. That requires care.

Wiring up the camera + model

// Android: CameraX analyzer runs your model on each frame
imageAnalysis.setAnalyzer(executor) { imageProxy ->
    val bitmap = imageProxy.toBitmap()
    val results = classifier.classify(TensorImage.fromBitmap(bitmap))
    runOnUiThread { overlay.show(results) }
    imageProxy.close()        // always close the frame
}

On iOS, the equivalent is an AVCaptureVideoDataOutput delegate that hands each frame to Vision/Core ML.

Keeping real-time fast

  • Throttle frames — you rarely need 30fps inference; process every 2nd–5th frame.
  • Use a small, fast model — accuracy vs speed is a real trade-off on live video.
  • Hardware acceleration — GPU/NPU delegates (Android) or the Neural Engine (iOS).
  • Downscale input — feed the model a smaller image when possible.
  • Drop frames under load — never queue up work faster than you can process it.

Drawing results over the camera

Detections are returned in the model’s coordinate space; you must map them to screen coordinates and draw boxes/labels on an overlay above the camera preview, accounting for rotation and scaling.

Common mistakes

  • Running inference on every frame on the main thread (freeze + heat).
  • Forgetting to close/release camera frames (memory leaks, crashes).
  • Not mapping detection coordinates correctly to the preview (boxes in the wrong place).
Summary: Phone cameras power most Mobile AI. Use CameraX (Android) or AVFoundation + Vision (iOS) to feed frames to a model, but for real time keep models small, throttle and downscale frames, accelerate with hardware, and draw results on an overlay mapped to the preview.