← All courses

On-Device ML on Android: TensorFlow Lite / LiteRT

🗓 May 31, 2026 ⏱ 2 min read

What is TensorFlow Lite (LiteRT)?

TensorFlow Lite — recently renamed LiteRT — is Google’s runtime for running ML models on-device, on Android (and iOS, and microcontrollers). You ship a small .tflite model file inside your app and run inference locally, with no server and no internet.

Why it’s great for mobile

  • Optimised to be small and fast on phones.
  • Can use hardware acceleration (GPU, and the NPU via delegates) for speed and battery savings.
  • Works fully offline and keeps data private.

The typical flow

  1. Add the dependency and bundle a .tflite model in assets/.
  2. Load the model into an Interpreter.
  3. Pre-process your input into the expected tensor.
  4. Run inference.
  5. Read and interpret the output.
// build.gradle: implementation("org.tensorflow:tensorflow-lite:2.x")

val model = FileUtil.loadMappedFile(context, "model.tflite")
val interpreter = Interpreter(model)

// input/output buffers sized to the model
val input = preprocessImage(bitmap)          // e.g. 1x224x224x3 floats
val output = Array(1) { FloatArray(NUM_CLASSES) }

interpreter.run(input, output)               // inference
val topLabel = labels[output[0].indices.maxByOrNull { output[0][it] }!!]

The easier path: Task Library & Model Maker

Writing raw tensor code is fiddly. Google’s TensorFlow Lite Task Library wraps common jobs (image classification, object detection, text) in simple, high-level classes — far less code and fewer pre-processing bugs.

val classifier = ImageClassifier.createFromFile(context, "model.tflite")
val results = classifier.classify(TensorImage.fromBitmap(bitmap))
// results -> list of labels with scores

Hardware acceleration with delegates

A delegate offloads work to faster hardware. The GPU delegate (and NNAPI/NPU on supported devices) can make inference several times faster and gentler on the battery.

val options = Interpreter.Options().apply { addDelegate(GpuDelegate()) }
val interpreter = Interpreter(model, options)

Keeping it smooth

Run inference off the main thread (a coroutine or background executor) so the UI never freezes — especially for video/camera frames where you process many images per second.

Common mistakes

  • Wrong input size/normalization (garbage output) — match the model spec.
  • Running inference on the main thread (janky UI).
  • Bundling a huge model that bloats the app — optimise it (later lesson).
Summary: TensorFlow Lite/LiteRT runs ML models on-device on Android. Bundle a .tflite file, use the Task Library for common jobs, accelerate with GPU/NPU delegates, and always run inference off the main thread.